Tensorflow Object Detection API 1-channel image - tensorflow

Is there any way to use pre-trained models in Object Detection API of Tensorflow, which trained for RGB images, for single channel grayscale images(depth) ?

I tried the following approach to perform object detection on Grayscale (1 Channel images) using a pre-trained model (faster_rcnn_resnet101_coco_11_06_2017) in Tensorflow. It did work for me.
The model was trained on RGB Images, So I just had to modify certain code in object_detection_tutorial.ipynb, available in the Tensorflow Repo.
First Change:
Note that exisitng code in the ipynb was written for 3 Channel Images, So change the load_image_into_numpy array function as shown
From
def load_image_into_numpy_array(image):
(im_width, im_height) = image.size
return np.array(image.getdata()).reshape(
(im_height, im_width, 3)).astype(np.uint8)
To
def load_image_into_numpy_array(image):
(im_width, im_height) = image.size
channel_dict = {'L':1, 'RGB':3} # 'L' for Grayscale, 'RGB' : for 3 channel images
return np.array(image.getdata()).reshape(
(im_height, im_width, channel_dict[image.mode])).astype(np.uint8)
Second Change: Grayscale images have only data in 1 channel. To perform object detection we need 3 channels(the inference code was written for 3 channels)
This can be achieved in two ways.
a) Duplicate the single channel data into two more channels
b) Fill the other two channels with Zeros.
Both of them will work, I used the first method
In the ipynb, go the section where you read the images and convert them into numpy arrays (the forloop at the end of the ipynb).
Change the code From:
for image_path in TEST_IMAGE_PATHS:
image = Image.open(image_path)
# the array based representation of the image will be used later in order to prepare the
# result image with boxes and labels on it.
image_np = load_image_into_numpy_array(image)
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0)
To this:
for image_path in TEST_IMAGE_PATHS:
image = Image.open(image_path)
# the array based representation of the image will be used later in order to prepare the
# result image with boxes and labels on it.
image_np = load_image_into_numpy_array(image)
if image_np.shape[2] != 3:
image_np = np.broadcast_to(image_np, (image_np.shape[0], image_np.shape[1], 3)).copy() # Duplicating the Content
## adding Zeros to other Channels
## This adds Red Color stuff in background -- not recommended
# z = np.zeros(image_np.shape[:-1] + (2,), dtype=image_np.dtype)
# image_np = np.concatenate((image_np, z), axis=-1)
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0)
That's it, Run the file and you should see the results.
These are my results

Related

image preprocess function for image_dataset_from_directory

In the ImageDataGenerator, I've used the following function to preprocess images, through the keyword of 'preprocessing' in .flow_from_dataframe().
However, I am now trying to use the image_dataset_from_directory, which does not work with the preprocess function, as it does not allow embedding this function.
I've tried to apply the preprocess_image() function after the dataset is generated by image_dataset_from_directory, through .map() function, but it does not work either.
Please could anyone advise?
Many thanks,
Tony
train_Gen = dataGen.flow_from_dataframe(
df,
x_col='id_code',
y_col='diagnosis',
directory=os.path.join(data_dir, 'train_images'),
batch_size=BATCH_SIZE,
target_size=(IMG_WIDTH, IMG_HEIGHT),
subset='training',
seed=123,
class_mode='categorical',
**preprocessing=preprocess_image**,
)
def crop_image_from_gray(img, tol=7):
"""
Applies masks to the orignal image and
returns the a preprocessed image with
3 channels
:param img: A NumPy Array that will be cropped
:param tol: The tolerance used for masking
:return: A NumPy array containing the cropped image
"""
# If for some reason we only have two channels
if img.ndim == 2:
mask = img > tol
return img[np.ix_(mask.any(1),mask.any(0))]
# If we have a normal RGB images
elif img.ndim == 3:
gray_img = cv2.cvtColor(img, cv2.COLOR_RGB2GRAY)
mask = gray_img > tol
check_shape = img[:,:,0][np.ix_(mask.any(1),mask.any(0))].shape[0]
if (check_shape == 0): # image is too dark so that we crop out everything,
return img # return original image
else:
img1=img[:,:,0][np.ix_(mask.any(1),mask.any(0))]
img2=img[:,:,1][np.ix_(mask.any(1),mask.any(0))]
img3=img[:,:,2][np.ix_(mask.any(1),mask.any(0))]
img = np.stack([img1,img2,img3],axis=-1)
return img
def preprocess_image(image, sigmaX=10):
"""
The whole preprocessing pipeline:
1. Read in image
2. Apply masks
3. Resize image to desired size
4. Add Gaussian noise to increase Robustness
:param img: A NumPy Array that will be cropped
:param sigmaX: Value used for add GaussianBlur to the image
:return: A NumPy array containing the preprocessed image
"""
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
image = crop_image_from_gray(image)
image = cv2.resize(image, (IMG_WIDTH, IMG_HEIGHT))
image = cv2.addWeighted (image,4, cv2.GaussianBlur(image, (0,0) ,sigmaX), -4, 128)
return image

How do I print the (frame number,bounding box information,confidence) of object detections to a text file in tensorflow object detector?

I used tensorflow object detection api for detecting multiple objects in my videos. However, I have been struggling with figuring out as to how to write these resulting object detections to a text/CSV/xml file (basically the bounding box information, the frame number of the images sequence, confidence of the bbox)
I've seen several answers in stackoverflow and github but most of them were either vague or I just could not get the exact answer I'm looking for.
Shown below is the last part of the detection code, I know that the detection_boxes and detection_scores are what I need but I just cannot figure out how to write these to a text file and also write only the final bbox detections which are seen on the images but not ALL detection bounding boxes
for image_path in TEST_IMAGE_PATHS:
image = Image.open(image_path) # the array based representation of the image will be used later in order to prepare the result image with boxes and labels on it.
image_np = load_image_into_numpy_array(image) # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0) # Actual detection.
output_dict = run_inference_for_single_image(image_np_expanded, detection_graph) # Visualization of the results of a detection.
vis_util.visualize_boxes_and_labels_on_image_array(
image_np,
output_dict['detection_boxes'],
output_dict['detection_classes'],
output_dict['detection_scores'],
category_index,
instance_masks=output_dict.get('detection_masks'),
use_normalized_coordinates=True,
line_thickness=8) plt.figure(figsize=IMAGE_SIZE)
plt.imshow(image_np)
You can try the following code
image = Image.open(image_path)
# the array based representation of the image will be used later in order to prepare the
# result image with boxes and labels on it.
image_np = load_image_into_numpy_array(image)
# Expand dimensions since the model expects images to have shape: [1, None, None, 3]
image_np_expanded = np.expand_dims(image_np, axis=0)
boxes = detection_graph.get_tensor_by_name('detection_boxes:0')
scores = detection_graph.get_tensor_by_name('detection_scores:0')
classes = detection_graph.get_tensor_by_name('detection_classes:0')
num_detections = detection_graph.get_tensor_by_name('num_detections:0')
(boxes, scores, classes, num_detections) = sess.run(
[boxes, scores, classes, num_detections],
feed_dict={image_tensor: image_np_expanded})
width = 1024
height = 600
threshold = 0.5
temp = [] # list to store scores greater than 0.5
# iterate through all scores and pick those having a score greater than 0.5
for index,value in enumerate(classes[0]):
if scores[0,index] > threshold:
temp.append(scores[0,index])
# Similarly, classes[0,index] will give you the class of the bounding box detection
# Actual detection.
output_dict = run_inference_for_single_image(image_np, detection_graph)
# For printing the bounding box coordinates
for i,j in zip(output_dict['detection_boxes'],output_dict['detection_scores']):
if(j>threshold):
print(i[1]*width,i[0]*height,i[3]*width,i[2]*height)
The above code snippet will provide you with the bounding box coordinates and the detection scores. You can use a minimum threshold to filter unnecessary detections. I hope this helps you out. Also, I could not quite understand what you meant by frame number. Could you please elucidate further on what you actually mean by this.
Please let me know if you face any issues

Tensorflow Serving: InvalidArgumentError: Expected image (JPEG, PNG, or GIF), got unknown format starting with 'AAAAAAAAAAAAAAAA'

I'm trying to prepare my custom Keras model for deploy to be used with Tensorflow Serving, but I'm running into issues with preprocessing my images.
When i train my model i use the following functions to preprocess my images:
def process_image_from_tf_example(self, image_str_tensor, n_channels=3):
image = tf.image.decode_image(image_str_tensor)
image.set_shape([256, 256, n_channels])
image = tf.cast(image, tf.float32) / 255.0
return image
def read_and_decode(self, serialized):
parsed_example = tf.parse_single_example(serialized=serialized, features=self.features)
input_image = self.process_image_from_tf_example(parsed_example["image_raw"], 3)
ground_truth_image = self.process_image_from_tf_example(parsed_example["gt_image_raw"], 1)
return input_image, ground_truth_image
My images are PNGs saved locally, and when i write them on the .tfrecord files i use
tf.gfile.GFile(str(image_path), 'rb').read()
This works, I'm able to train my model and use it for local predictions.
Now I want to deploy my model to be used with Tensorflow Serving. My serving_input_receiver_fn function looks like this:
def serving_input_receiver_fn(self):
input_ph = tf.placeholder(dtype=tf.string, shape=[None], name='image_bytes')
images_tensor = tf.map_fn(self.process_image_from_tf_example, input_ph, back_prop=False, dtype=tf.float32)
return tf.estimator.export.ServingInputReceiver({'input_1': images_tensor}, {'image_bytes': input_ph})
where process_image_from_tf_example is the same function as above, but i get the following error:
InvalidArgumentError (see above for traceback): assertion failed: [Unable to decode bytes as JPEG, PNG, GIF, or BMP]
Reading here it looks like this error is due to the fact that i'm not using
tf.gfile.GFile(str(image_path), 'rb').read()
as with my training/test files, but i can't use it because i need to send encoded bytes formatted as
{"image_bytes": {'b64': base64.b64encode(image).decode()}}
as requested by TF Serving.
Examples online send JPEG encoded bytes and preprocess the image starting with
tf.image.decode_jpeg(image_buffer, channels=3)
but if i use a different preprocessing function in my serving_input_receiver_fn (different than the one used for training) that starts with
tf.image.decode_png(image_buffer, channels=3)
i get the following error:
InvalidArgumentError (see above for traceback): Expected image (JPEG, PNG, or GIF), got unknown format starting with 'AAAAAAAAAAAAAAAA'
(the same happens with decode_jpeg, by the way)
What am i doing wrong? Do you need more code from me to answer? Thanks a lot!
Edit!!
Changed the title because it was not clear enough
OK I solved it.
image was a numpy array but i had to do the following:
buffer = cv2.imencode('.jpg', image)[1].tostring()
bytes_image = base64.b64encode(buffer).decode('ascii')
{"image_bytes": {"b64": bytes_image}}
Also, my preprocessing and serving_input_receiver_fn functions changed:
def process_image_from_buffer(self, image_buffer):
image = tf.image.decode_jpeg(image_buffer, channels=3)
image = tf.image.convert_image_dtype(image, dtype=tf.float32)
image = tf.expand_dims(image, 0)
image = tf.image.resize_bilinear(image, [256, 256], align_corners=False)
image = tf.squeeze(image, [0])
image = tf.cast(image, tf.float32) / 255.0
return image
def serving_input_receiver_fn(self):
input_ph = tf.placeholder(dtype=tf.string, shape=[None])
images_tensor = tf.map_fn(self.process_image_from_buffer, input_ph, back_prop=False, dtype=tf.float32)
return tf.estimator.export.ServingInputReceiver({'input_1': images_tensor}, {'image_bytes': input_ph})
process_image_from_buffer is different than process_image_from_tf_example used above for training.
I also removed name='image_bytes' from input_ph above.
Hope it's clear enough to help someone else.
Excellent guide partially used for solving it

How to use feed_dict in slim.learning.train of tensorflow

I read an example in tf-slim-mnist, and read one or two answers in Google, but all of them feed data to an 'images' tensor and a 'labels' tensor from an already filled-up tenser of data. For example, in tf-slim-mnist,
# load batch of dataset
images, labels = load_batch(
dataset,
FLAGS.batch_size,
is_training=True)
def load_batch(dataset, batch_size=32, height=28, width=28, is_training=False):
data_provider = slim.dataset_data_provider.DatasetDataProvider(dataset)
image, label = data_provider.get(['image', 'label'])
image = lenet_preprocessing.preprocess_image(
image,
height,
width,
is_training)
images, labels = tf.train.batch(
[image, label],
batch_size=batch_size,
allow_smaller_final_batch=True)
return images, labels
Another example, in tensorflow github issues #5987,
graph = tf.Graph()
with graph.as_default():
image, label = input('train', FLAGS.dataset_dir)
images, labels = tf.train.shuffle_batch([image, label], batch_size=FLAGS.batch_size, capacity=1000 + 3 * FLAGS.batch_size, min_after_dequeue=1000)
images_validation, labels_validation = inputs('validation', FLAGS.dataset_dir, 5000)
images_test, labels_test = inputs('test', FLAGS.dataset_dir, 10000)
Because my data is of variable size, it is hard to fill up a tensor of data beforehand.
Is there any way to use feed_dict with slim.learning.train()? Is it a proper way to add feed_dict as an argument to the train_step_fn()? If yes, how? Thanks.
I think feed_dict is not a good way when input data size varies and hard to fill in memory.
Convert your data into tfrecords is a more proper way. Here is the example of convert data. You can deal with the data by TFRecordReader and parse_example to deal with output file.

Is the input of `tf.image.resize_images` must have static shape?

I run the code below, it raises an ValueError: 'images' contains no shape. Therefore I have to add the line behind # to set the static shape, but img_raw may have different shapes and this line makes the tf.image.resize_images out of effect.
I just want to turn images with different shapes to [227,227,3]. How should I do that?
def tf_read(file_queue):
reader = tf.WholeFileReader()
file_name, content = reader.read(file_queue)
img_raw = tf.image.decode_image(content,3)
# img_raw.set_shape([227,227,3])
img_resized = tf.image.resize_images(img_raw,[227,227])
img_shape = tf.shape(img_resized)
return file_name, img_resized,img_shape
The issue here actually comes from the fact that tf.image.decode_image doesn't return the shape of the image. This was explained in these two GitHub issues: issue1, issue2.
The problem comes from the fact that tf.image.decode_image also handles .gif, which returns a 4D tensor, whereas .jpg and .png return 3D images. Therefore, the correct shape cannot be returned.
The solution is to simply use tf.image.decode_jpeg or tf.image.decode_png (both work the same and can be used on .png and .jpg images).
def _decode_image(filename):
image_string = tf.read_file(filename)
image_decoded = tf.image.decode_jpeg(image_string, channels=3)
image = tf.cast(image_decoded, tf.float32)
image_resized = tf.image.resize_images(image, [224, 224])
return image_resized
No, tf.image.resize_images can handle dynamic shape
file_queue = tf.train.string_input_producer(['./dog1.jpg'])
# shape of dog1.jpg is (720, 720)
reader = tf.WholeFileReader()
file_name, content = reader.read(file_queue)
img_raw = tf.image.decode_jpeg(content, 3) # size (?, ?, 3) <= dynamic h and w
# img_raw.set_shape([227,227,3])
img_resized = tf.image.resize_images(img_raw, [227, 227])
img_shape = tf.shape(img_resized)
with tf.Session() as sess:
print img_shape.eval() #[227, 227, 3]
BTW, I am using tf v0.12, and there is no function called tf.image.decode_image, but I don't think it is important
Of course you can use tensor object as size input for tf.image.resize_images.
So, by saying "turn images with different shapes to [227,227,3]", I suppose you don't want to lose their aspect ratio, right? To achieve this, you have to rescale the input image first, then pad the rest with zero.
It should be noted, though, you should consider perform image distortion and standardization before padding it.
# Rescale so that one side of image can fit one side of the box size, then padding the rest with zeros.
# target height is 227
# target width is 227
image = a_image_tensor_you_read
shape = tf.shape(image)
img_h = shape[0]
img_w = shape[1]
box_h = tf.convert_to_tensor(target_height)
box_w = tf.convert_to_tensor(target_width)
img_ratio = tf.cast(tf.divide(img_h, img_w), tf.float32)
aim_ratio = tf.convert_to_tensor(box_h / box_w, tf.float32)
aim_h, aim_w = tf.cond(tf.greater(img_ratio, aim_ratio),
lambda: (box_h,
tf.cast(img_h / box_h * img_w, tf.int32)),
lambda: (tf.cast(img_w / box_w * img_h, tf.int32),
box_w))
image_resize = tf.image.resize_images(image, tf.cast([aim_h, aim_w], tf.int32), align_corners=True)
# Perform image standardization and distortion
image_standardized_distorted = blablabla
image_padded = tf.image.resize_image_with_crop_or_pad(image_standardized_distorted, box_h, box_w)
return image_padded