dimentions of images as an input of LeNet_5 - tensorflow

I am still a beginner in deep learning, I am wondering is it necessary to have for the input images of a size equal to 32*32 (or X*X)? the dimensions of my images are 457*143.
Thank you.

If you want to implement a LeNet and train it from the scratch, you don't have to resize your images. However, if you want to do transfer learning, you'd better resize your images according to the image size of the dataset on which your neural net is already trained.

Related

Building a LSTM model for binary image classification with RGB images

I am struggling to build a binary image classification model for my custom image datasets (RGB) using LSTM. I found some excellent tutorials that solve the same kind of problem but with Grayscale images, but unable able to sort out how I can reshape my inputs.
Thanks in advance..

Keras model val_accuracy 1.00, and gives wrong output when testing

I've trained a Kaggle dataset (this one) to detect hand gestures. when training, it gives the val_accuracy = 1.00, here is an image or you can see it using the
link to colab
when I try to test the model using an image from the dataset, it gives right predictions, but when I try to use real-world image for "ok" gesture (you can see it in the end of the colab project), it just gives wrong outputs, I've tries other images, it gives also wrong predictions.
any help please?
When you have a real world image you want to predict you must process that image in exactly the same as you processed the training images. For example
image size must be the same
pixels must be scaled the same
if trained on rgb images real world image must be an rgb image
if trained on grayscale real world image must be gray scale

change the input image size for mobilenet_ssd using tensorflow

I am using tensorflow and tflite to detect object. The model I use is mobilenet_ssd (version 2) from https://github.com/tensorflow/models/tree/master/research/object_detection
the input image size for detection is fixed 300*300, which is hard-coded in the model.
I want to input 1280*720 image for detection, how to do this? I do not have the traing image dataset of resolution 1280*720. I only have pascal and coco dataset.
How to modify the model to accept 1280*720 image(do not scale the image) for detection?
To change the input size of the image, you need to redesign the anchor box position. Because the anchors are fixed to the input image resolution. Once you change the anchor positions to 720P, then the mobilenet can accept 720p as input.
The common practice is scaling the input image before feeding the data into TensorFlow / TensorFlow Lite.
Note: The image in the training data set aren't 300*300 originally. The original resolution may be bigger and non-square, and it's downscaled to 300*300. It means it's totally fine to downscale 1280*720 image to 300*300 image and it should work fine.
Do you mind to try scaling and see if it works?

How to not resize input image while running Tensorflow SSD's inference

From what I can understand from Single Shot Multibox Detector paper, it is a fully convolutional network. As such, it won't require the rescaling which tensorflow is doing (to 300x300) during inference. How can I remove this resizing during inference in tensorflow?
You can configure this in model config file
image_resizer {
fixed_shape_resizer {
height: 300
width: 300
}
}
If you remove image_resizer it should work fine. But to answer your question: Why do you want to do remove resizing?
Removing resizing would seriously impact training time and performance. And since your training images and input images are resized by tensorflow model config, model could still 'see' it same, in case you were thinking about information loss. Also, SSD was trained on COCO and for aforementioned reasons of training time and performance, they thought of resizing them.
Though you could try following alternatives than resizing if for some reasons that is not what you want to do.
Multiple crops. For example AlexNet was originally trained on 5 different crops: center, top-left, top-right, bottom-left, bottom-right.
Random crops. Just take a number of random crops from the image and hope that the Neural Network will not be biased.
Resize and deform. Resize the image to a fixed size without considering the aspect ratio. This will deform the image contents but preserves but now you are sure that no content is cut.
Variable-sized Inputs. Do not crop and train the network on variable sized images, using something like Spatial Pyramid Pooling to extract a fixed size feature vector that can be used with fully connected layers.

TensorFlow: Collecting my own training data set & Using that training dataset to find the location of object

I'm trying to collect my own training data set for the image detection (Recognition, yet). Right now, I have 4 classes and 750 images for each. Each images are just regular images of the each classes; however, some of images are blur or contain outside objects such as, different background or other factors (but nothing distinguishable stuff). Using that training data set, image recognition is really bad.
My question is,
1. Does the training image set needs to contain the object in various background/setting/environment (I believe not...)?
2. Lets just say training worked fairly accurately and I want to know the location of the object on the image. I figure there is no way I can find the location just using the image recognition, so if I use the bounding box, how/where in the code can I see the location of the bounding box?
Thank you in advance!
It is difficult to know in advance what features your programm will learn for each class. But then again, if your unseen images will be in the same background, the background will play no role. I would suggest data augmentation in training; randomly color distortion, random flipping, random cropping.
You can't see in the code where the bounding box is. You have to label/annotate them yourself first in your collected data, using a tool as LabelMe for example. Then comes learning the object detector.