Tensorflow Object detection API: printing the detected objects label name - tensorflow

I am following Nicollas renotte's tutorial on Realtime hand-sign detection with TensorFlow and OpenCV and finished the code.
import cv2
import numpy as np
import time
category_index = label_map_util.create_category_index_from_labelmap(ANNOTATION_PATH+'/label_map.pbtxt')
cap = cv2.VideoCapture(0)
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
while True:
ret, frame = cap.read()
image_np = np.array(frame)
input_tensor = tf.convert_to_tensor(np.expand_dims(image_np, 0), dtype=tf.float32)
detections = detect_fn(input_tensor)
num_detections = int(detections.pop('num_detections'))
detections = {key: value[0, :num_detections].numpy()
for key, value in detections.items()}
detections['num_detections'] = num_detections
# detection_classes should be ints.
detections['detection_classes'] = detections['detection_classes'].astype(np.int64)
label_id_offset = 1
image_np_with_detections = image_np.copy()
cv2.imshow('object detection', cv2.resize(image_np_with_detections, (800, 600)))
if cv2.waitKey(1) & 0xFF == ord('q'):
detections = detect_fn(input_tensor)
so this code is running fine and recognize the hand sign and draw a box around the hand-sign and labels it but I want to print the name of the recognized hand-sign in the terminal itself ( for using it with pyttx3 to speak out the sign that's detected)
I tried just printing the detections['detection_classes'] but that only give some sort of array as output can anyone explain how I can print the name of the object detected with the score?
Thanks in advance, first post on Stack Overflow so please go easy on me

detections['detection_classes'] returns the category id of each bouding box detected.
A category index is a dictionary that maps integer ids to dicts
containing categories, e.g. {1: {'id': 1, 'name': 'dog'}, 2: {'id': 2, 'name': 'cat'}, ...}.
So if you print category_index, you will get something like this:
{1: {'id': 1, 'name': 'Aa'}, 2: {'id': 2, 'name': 'Bb'}, ...}
assuming you are dealing with hand signs of alphabets.
With this knowledge, it is easy to print the label for the hand-sign detected.
# flatten the category_index to a single dictionary
category_dict = {value.get('id'):value.get('name') for _,value in category_index.items()}
detected_signs = []
for sign_index in detections['detection_classes']:
sign_label = category_dict.get(sign_index)
# Feed detected_signs to downstream system like pyttx3 to speak out the sign


How do I output all of the test set from this model into an excel or txt file?

This code is from Pointnet and at the end, it plots a nice picture of the point cloud and then the label versus the prediction. However, in that image, you can only show the 8 predictions.
This is that part of code:
data = test_dataset.take(1)
points, labels = list(data)[0]
points = points[:8, ...]
labels = labels[:8, ...]
# run test data through model
preds = model.predict(points)
preds = tf.math.argmax(preds, -1)
points = points.numpy()
### plot points with predicted class and label
fig = plt.figure(figsize=(10, 15))
for i in range(8):
ax = fig.add_subplot(2, 4, i + 1, projection="3d")
ax.scatter(points[i, :, 0], points[i, :, 1], points[i, :, 2])
"pred: {:}, label: {:}".format(
CLASS_MAP[preds[i].numpy()], CLASS_MAP[labels.numpy()[i]]
This was my code just tried to print it to a .txt file before an excel just to test if it works.
for I in range(8):
with open("MODEL_OUTCOMES/predictions.txt","w") as f:
f.write("pred: {:}, label: {:}".format(
CLASS_MAP[preds[i].numpy()], CLASS_MAP[labels.numpy()[i]]))
However, when I increase the range for this I get this error:
raise core._status_to_exception(e) from None # pylint: disable=protected-access
tensorflow.python.framework.errors_impl.InvalidArgumentError: {{function_node __wrapped__StridedSlice_device_/job:localhost/replica:0/task:0/device:CPU:0}} slice index 10 of dimension 0 out of bounds. [Op:StridedSlice] name: strided_slice/
And when I do
for i in preds.numpy():
I only get one outcome
pred: Chair, label: Chair
I would just like to print out the entire test set of 290s prediction versus labels.

Tensorflow lite only using the first item in the labelmap.txt file when identifying items

I have installed tensorflow 1.15 and created a custom model. I converted it into a .tflite file so tensorflow lite can read it. Then I ran the following code:
import os
import argparse
import cv2
import numpy as np
import sys
import glob
import importlib.util
parser = argparse.ArgumentParser()
parser.add_argument('--modeldir', help='Folder the .tflite file is located in', required=True)
parser.add_argument('--graph', help='Name of the .tflite file, if different than detect.tflite', default='detect.tflite')
parser.add_argument('--labels', help='Name of the labelmap file, if different than labelmap.txt', default='labelmap.txt')
parser.add_argument('--threshold', help='Minimum confidence threshold for displaying detected objects', default=0.5)
parser.add_argument('--image', help='Name of the single image to perform detection on. To run detection on multiple images, use --imagedir', default=None)
parser.add_argument('--imagedir', help='Name of the folder containing images to perform detection on. Folder must contain only images.', default=None)
parser.add_argument('--edgetpu', help='Use Coral Edge TPU Accelerator to speed up detection', action='store_true')
args = parser.parse_args()
MODEL_NAME = args.modeldir
GRAPH_NAME = args.graph
LABELMAP_NAME = args.labels
min_conf_threshold = float(args.threshold)
use_TPU = args.edgetpu
IM_NAME = args.image
IM_DIR = args.imagedir
if (IM_NAME and IM_DIR):
print('Error! Please only use the --image argument or the --imagedir argument, not both. Issue "python TFLite_detection_image.py -h" for help.')
if (not IM_NAME and not IM_DIR):
IM_NAME = 'test1.jpg'
pkg = importlib.util.find_spec('tflite_runtime')
if pkg:
from tflite_runtime.interpreter import Interpreter
if use_TPU:
from tflite_runtime.interpreter import load_delegate
from tensorflow.lite.python.interpreter import Interpreter
if use_TPU:
from tensorflow.lite.python.interpreter import load_delegate
if use_TPU:
if (GRAPH_NAME == 'detect.tflite'):
GRAPH_NAME = 'edgetpu.tflite'
CWD_PATH = os.getcwd()
if IM_DIR:
images = glob.glob(PATH_TO_IMAGES + '/*')
elif IM_NAME:
images = glob.glob(PATH_TO_IMAGES)
with open(PATH_TO_LABELS, 'r') as f:
labels = [line.strip() for line in f.readlines()]
if labels[0] == '???':
if use_TPU:
interpreter = Interpreter(model_path=PATH_TO_CKPT, experimental_delegates=[load_delegate('libedgetpu.so.1.0')])
interpreter = Interpreter(model_path=PATH_TO_CKPT)
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
height = input_details[0]['shape'][1]
width = input_details[0]['shape'][2]
floating_model = (input_details[0]['dtype'] == np.float32)
input_mean = 127.5
input_std = 127.5
for image_path in images:
image = cv2.imread(image_path)
image_rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
imH, imW, _ = image.shape
image_resized = cv2.resize(image_rgb, (width, height))
input_data = np.expand_dims(image_resized, axis=0)
if floating_model:
input_data = (np.float32(input_data) - input_mean) / input_std
boxes = interpreter.get_tensor(output_details[0]['index'])[0] # Bounding box coordinates of detected objects
classes = interpreter.get_tensor(output_details[1]['index'])[0] # Class index of detected objects
scores = interpreter.get_tensor(output_details[2]['index'])[0] # Confidence of detected objects
for i in range(len(scores)):
if ((scores[i] > min_conf_threshold) and (scores[i] <= 1.0)):
ymin = int(max(1,(boxes[i][0] * imH)))
xmin = int(max(1,(boxes[i][1] * imW)))
ymax = int(min(imH,(boxes[i][2] * imH)))
xmax = int(min(imW,(boxes[i][3] * imW)))
cv2.rectangle(image, (xmin,ymin), (xmax,ymax), (10, 255, 0), 2)
object_name = labels[int(classes[i])] # Look up object name from "labels" array using class index
label = '%s: %d%%' % (object_name, int(scores[i]*100)) # Example: 'person: 72%'
labelSize, baseLine = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.7, 2) # Get font size
label_ymin = max(ymin, labelSize[1] + 10) # Make sure not to draw label too close to top of window
cv2.rectangle(image, (xmin, label_ymin-labelSize[1]-10), (xmin+labelSize[0], label_ymin+baseLine-10), (255, 255, 255), cv2.FILLED) # Draw white box to put label text in
cv2.putText(image, label, (xmin, label_ymin-7), cv2.FONT_HERSHEY_SIMPLEX, 0.7, (0, 0, 0), 2)
cv2.imshow('Object detector', image)
if cv2.waitKey(0) == ord('q'):
Now, my custom model seems to work. It located the items on the image correctly but it labels everything with the first item on the labelmap.txt. For example:
The model identifies the remotes in the images but labels them as "key" because it is the first thing in the labelmap.txt. I don't know why this is happening, can someone please help me. I am sorry if anything is unclear. Please let me know and I will try my best to clarify a little better. Thank you.
I followed the https://github.com/EdjeElectronics/TensorFlow-Lite-Object-Detection-on-Android-and-Raspberry-Pi.

Get element detections before the inference tensorflow 2

this week i'm "playing" with Tensorflow 2 and i try object detection and i dont know how to do the following:
In the tutorial TF2 object detection, get the inference of some elements in one image, as i show in the following code:
image_np = load_image_into_numpy_array(image_path)
input_tensor = tf.convert_to_tensor(image_np)
input_tensor = input_tensor[tf.newaxis, ...]
detections = detect_fn(input_tensor)
But i need to get the elements or regions detected, before the inference. I mean, the coordinates of the proposed regions but i dont know how to do that. I try to split the process, in one hand the region proposal and in the other hand the inference.
My code is the following:
def make_inference(image_path,counter,image_save):
print('Running inference for {}... '.format(image_path), end='')
image_np = load_image_into_numpy_array(image_path)
input_tensor = tf.convert_to_tensor(image_np)
input_tensor = input_tensor[tf.newaxis, ...]
detections = detect_fn(input_tensor)
num_detections = int(detections.pop('num_detections'))
detections = {key: value[0, :num_detections].numpy()
for key, value in detections.items()}
detections['num_detections'] = num_detections
detections['detection_classes'] = detections['detection_classes'].astype(np.int64)
image_np_with_detections = image_np.copy()
nombre = str(counter)+'.jpg'
plt.savefig('/content/RESULTADOS/'+nombre, dpi=dpi ,bbox_inches='tight')
counter = counter+1
Thanks in advance.
I worked as a software engineer and data science really requires a lot of OOPS implemented hence (however OOPS in Python is a Joke [IMO]), I have taken the liberty to draw out a class instead and have the following function to get a List[DetectedObj]
Simple POJO class to hold every detection you received.
from typing import Dict, Any, Optional, List
import numpy as np
class DetectedObject:
def __init__(self, ymin: float, xmin: float, ymax: float, xmax: float, category: str, score: float):
self.xmin = xmin
self.ymin = ymin
self.xmax = xmax
self.ymax = ymax
self.clazz = clazz
Call the following function & pass your detections which you received from detect_fn
def get_objects_from_detections(detections: Dict[str, Optional[Any]], categories: Dict[int, Optional[Any]], threshold: float = 0.0) -> List[DetectedObject]:
det_objs = []
bbox_list = detections['detection_boxes'].tolist()
for i, clazz in np.ndenumerate(detections['detection_classes']):
score = detections['detection_scores'][i]
if score > threshold:
clazz_cat = categories[clazz]['name']
row = bbox_list[i[0]]
tiny = DetectedObject(row[0], row[1], row[2], row[3], clazz_cat, score)
det_objs .append(tiny)
return det_objs

Why am I getting "IndexError: list index out of range" when training on the cloud?

I resorted to using the cloud training workflow. Given the product I got, I would have expected to drop directly into the code that I have that works with other tflite models, but the cloud produced model doesn't work. I get "index out of range" when asking for interpreter.get_tensor parameters.
Here is my code, basically a modified example, where I can ingest a video and produce a video with results.
import argparse
import cv2
import numpy as np
import sys
import importlib.util
# Define and parse input arguments
parser = argparse.ArgumentParser()
parser.add_argument('--modeldir', help='Folder the .tflite file is located in',
parser.add_argument('--graph', help='Name of the .tflite file, if different than detect.tflite',
# default='/tmp/detect.tflite')
parser.add_argument('--labels', help='Name of the labelmap file, if different than labelmap.txt',
# default='/tmp/coco_labels.txt')
parser.add_argument('--threshold', help='Minimum confidence threshold for displaying detected objects',
parser.add_argument('--video', help='Name of the video file',
parser.add_argument('--edgetpu', help='Use Coral Edge TPU Accelerator to speed up detection',
args = parser.parse_args()
MODEL_NAME = args.modeldir
GRAPH_NAME = args.graph
LABELMAP_NAME = args.labels
VIDEO_NAME = args.video
min_conf_threshold = float(args.threshold)
use_TPU = args.edgetpu
# Import TensorFlow libraries
# If tensorflow is not installed, import interpreter from tflite_runtime, else import from regular tensorflow
# If using Coral Edge TPU, import the load_delegate library
pkg = importlib.util.find_spec('tensorflow')
pkg = True
if pkg is None:
from tflite_runtime.interpreter import Interpreter
if use_TPU:
from tflite_runtime.interpreter import load_delegate
from tensorflow.lite.python.interpreter import Interpreter
if use_TPU:
from tensorflow.lite.python.interpreter import load_delegate
# If using Edge TPU, assign filename for Edge TPU model
if use_TPU:
# If user has specified the name of the .tflite file, use that name, otherwise use default 'edgetpu.tflite'
if (GRAPH_NAME == 'detect.tflite'):
GRAPH_NAME = 'edgetpu.tflite'
# Get path to current working directory
CWD_PATH = os.getcwd()
# Path to video file
# Path to .tflite file, which contains the model that is used for object detection
# Path to label map file
# Load the label map
with open(PATH_TO_LABELS, 'r') as f:
labels = [line.strip() for line in f.readlines()]
# Have to do a weird fix for label map if using the COCO "starter model" from
# https://www.tensorflow.org/lite/models/object_detection/overview
# First label is '???', which has to be removed.
if labels[0] == '???':
# Load the Tensorflow Lite model.
# If using Edge TPU, use special load_delegate argument
if use_TPU:
interpreter = Interpreter(model_path=PATH_TO_CKPT,
interpreter = Interpreter(model_path=PATH_TO_CKPT)
# Get model details
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
height = input_details[0]['shape'][1]
width = input_details[0]['shape'][2]
floating_model = (input_details[0]['dtype'] == np.float32)
input_mean = 127.5
input_std = 127.5
# Open video file
video = cv2.VideoCapture(VIDEO_PATH)
imW = video.get(cv2.CAP_PROP_FRAME_WIDTH)
imH = video.get(cv2.CAP_PROP_FRAME_HEIGHT)
out = cv2.VideoWriter('output.avi', cv2.VideoWriter_fourcc(
'M', 'J', 'P', 'G'), 10, (1920, 1080))
# Acquire frame and resize to expected shape [1xHxWx3]
ret, frame = video.read()
frame_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
frame_resized = cv2.resize(frame_rgb, (width, height))
input_data = np.expand_dims(frame_resized, axis=0)
# Normalize pixel values if using a floating model (i.e. if model is non-quantized)
if floating_model:
input_data = (np.float32(input_data) - input_mean) / input_std
# Perform the actual detection by running the model with the image as input
# Retrieve detection results
boxes = interpreter.get_tensor(output_details[0]['index'])[0] # Bounding box coordinates of detected objects
classes = interpreter.get_tensor(output_details[1]['index'])[0] # Class index of detected objects
scores = interpreter.get_tensor(output_details[2]['index'])[0] # Confidence of detected objects
print (boxes)
print (classes)
print (scores)
#num = interpreter.get_tensor(output_details[3]['index'])[0] # Total number of detected objects (inaccurate and not needed)
# Loop over all detections and draw detection box if confidence is above minimum threshold
for i in range(len(scores)):
if ((scores[i] > min_conf_threshold) and (scores[i] <= 1.0)):
# Get bounding box coordinates and draw box
# Interpreter can return coordinates that are outside of image dimensions, need to force them to be within image using max() and min()
ymin = int(max(1,(boxes[i][0] * imH)))
xmin = int(max(1,(boxes[i][1] * imW)))
ymax = int(min(imH,(boxes[i][2] * imH)))
xmax = int(min(imW,(boxes[i][3] * imW)))
cv2.rectangle(frame, (xmin,ymin), (xmax,ymax), (10, 255, 0), 4)
# Draw label
object_name = labels[int(classes[i])] # Look up object name from "labels" array using class index
label = '%s: %d%%' % (object_name, int(scores[i]*100)) # Example: 'person: 72%'
labelSize, baseLine = cv2.getTextSize(label, cv2.FONT_HERSHEY_SIMPLEX, 0.7, 2) # Get font size
label_ymin = max(ymin, labelSize[1] + 10) # Make sure not to draw label too close to top of window
cv2.rectangle(frame, (xmin, label_ymin-labelSize[1]-10), (xmin+labelSize[0],
label_ymin+baseLine-10), (255, 255, 255), cv2.FILLED) # Draw white box to put label text in
cv2.putText(frame, label, (xmin, label_ymin-7), cv2.FONT_HERSHEY_SIMPLEX,
0.7, (0, 0, 0), 2) # Draw label text
# All the results have been drawn on the frame, so it's time to display it.
cv2.imshow('Object detector', frame)
#output_rgb = cv2.cvtColor(frame, cv2.COLOR_RGB2BGR)
# Press 'q' to quit
if cv2.waitKey(1) == ord('q'):
# Clean up
Here is what the print statements should look like when using the canned tflite model:
[32. 76. 56. 76. 0. 61. 74. 0. 0. 0.]
[0.609375 0.48828125 0.44921875 0.44921875 0.4140625 0.40234375
0.37890625 0.3125 0.3125 0.3125 ]
[[-0.01923192 0.17330796 0.747546 0.8384144 ]
[ 0.01866053 0.5023282 0.39603746 0.6143299 ]
[ 0.01673795 0.47382414 0.34407628 0.5580931 ]
[ 0.11588445 0.78543806 0.8778869 1.0039229 ]
[ 0.8106107 0.70675755 1.0080075 0.89248717]
[ 0.84941524 0.06391776 1.0006479 0.28792098]
[ 0.05543692 0.53557926 0.40413857 0.62823087]
[ 0.07051808 -0.00938512 0.8822515 0.28100258]
[ 0.68205094 0.33990026 0.9940187 0.6020821 ]
[ 0.08010477 0.01998334 0.6011186 0.26135433]]
Here is the error when presented with the cloud created model:
File "tflite_vid.py", line 124, in <module>
classes = interpreter.get_tensor(output_details[1]['index'])[0] # Class index of detected objects
IndexError: list index out of range
So I would kindly ask that someone explain how to either develop a TFlite model with TF2 with Python or how to get the cloud to generate a usable TFlite model. Please oh please do not point me into a direction that entails wondering through the Internet examples unless they are the actual gospel on how to do this.,
In output_details[1], it is [1] <- list index out of range. Your model may have 1 output, but the code try to access the 2nd output.
For more usage about Python code, please refer to https://www.tensorflow.org/lite/guide/inference#load_and_run_a_model_in_python for guidance.

How to convert numpy.ndarray to tfrecord?

I have a large dataset ,The dataset has two feature,first feature is data,the second feature is label,the dataset size is about 6GB,when I run the code as follows:
#data_from_dataset represent data from 4G dataset, data_from_dataset
#type is ndarray,The data_from_dataset shape is two dimension like (a
#very large num,15)
#label_from_dataset represent label from 4G dataset,,label_from_dataset type
#is ndarray also ndarray
#label_from_dataset #shape is two dimension like (a very large num,15)
data_from_dataset, label_from_dataset = load_train_data()
#calc total batch count
num_batch = len(data_from_dataset) // hp.batch_size
# Convert to tensor
X = tf.convert_to_tensor(data_from_dataset, tf.int32)
Y = tf.convert_to_tensor(label_from_dataset, tf.int32)
# Create Queues
input_queues = tf.train.slice_input_producer([X, Y])
# create batch queues
x, y = tf.train.shuffle_batch(input_queues,
it runs very slowly after wating a long time ,the console hints the error as follows:
Error:cannot create a tensor larger than 2GB
it seems problem in these code line:
# Convert to tensor
X = tf.convert_to_tensor(data_from_dataset, tf.int32)
Y = tf.convert_to_tensor(label_from_dataset, tf.int32)
def _int64_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
def save_tfrecords(data_from_dataset, label_from_dataset, desfile):
with tf.python_io.TFRecordWriter(desfile) as writer:
for i in range(len(data_from_dataset)):
features = tf.train.Features(
feature = {
"data": _int64_feature(data[i]),
"label": _int64_feature(label[i])
example = tf.train.Example(features = features)
serialized = example.SerializeToString()
def read_and_decode(filename_queue):
reader = tf.TFRecordReader()
_, serialized_example = reader.read(filename_queue)
features = tf.parse_single_example(
'data': tf.FixedLenFeature([], tf.string),
'label': tf.FixedLenFeature([], tf.string),
sent = features['data']
tag = features['label']
return sent, tag
save_tfrecords(data_from_dataset, label_from_dataset, fname_out)
filename_queue = tf.train.string_input_producer(fname_out, shuffle=True)
example, label = read_and_decode(filename_queue, 2)
x, y = tf.train.shuffle_batch([example, label],
it hints the error on code line as follows:
def _int64_feature(value):
return tf.train.Feature(int64_list=tf.train.Int64List(value=[value]))
Error:only length-1 arrays can be converted to python scalars
How to convert numpy to tfrecord ?is there any other method?
Function tf.train.Int64List is not for arrays.
You need to use tf.train.BytesList instead
data = np.random.rand(15,)
writer = tf.python_io.TFRecordWriter('file.tfrecords')
str = data.tostring()
example = tf.train.Example(features=tf.train.Features(feature={'1': _bytes_feature(str)}))
You can then decode it with tf.decode_raw of you can inspect tfrecord file with
for str_rec in tf.python_io.tf_record_iterator('file.tfrecords'):
example = tf.train.Example()
str = (example.features.feature['1'].bytes_list.value[0])
your_data = np.fromstring(str, dtype)