Tensorflow RGB-D Training - tensorflow

I have RGB-D (color&depth) images for given scene. I would like to use tensorflow to train a classification model based on pre-trained network such as inception. As far as I understood, these pre-trained models were built using 3-channel RGB images. However, the inclusion of 4th channel cannot be handled.
How do I use RGB-D images directly? Do I need to pre-process the images, and seperate RGB and D, if so, how do I use the D (1-channel) alone?
Thank you!

If you want to use a pre-trained model you can only use RGB, as they were only trained to understand RGB. In this case, it is as you said: separate them and discard depth.
To use a 4 channel image like this you would need to retrain the network from scratch rather than loading a pre-trained set of weights.
You will probably get good results using the same architecture as is used for 3 channel images (save for the minor change required to support the 4 channel input), so retraining shouldn't be terribly hard.

Related

training image datasets for object detection

Which version of YOLO-tensorflow (customised cnn like googlenet) is preferred for traffic science?
If the training datasets are blurred and are with noise is that okay to train or what are the steps to be considered for training dataset images?
You may need to curate your own dataset using frames from a traffic camera and manually tagging images with cars where the passengers' seatbelts are or are not buckled, as this is a very specialized task. From there, you can do data augmentation (perhaps using the Keras ImageDataGenerator class). If a human can identify a seatbelt in an image that is blurred or noisy, a model can learn from it. From there, you can use transfer learning from a pre-trained CNN model like Inception (this is a helpful tutorial for how to do that), or train your own binary classifier with your tagged images, where your inputs are frames of traffic camera video.
I'd suggest that after learning the basics of CNNs with these models, only then should you dive into a more complicated model like yolo.

Pre Trained LeNet Model for License plate Recognition

I have implemented a form of the LeNet model via tensorflow and python for a Car number plate recognition system. My model was trained solely on my train data and tested on the test data. My dataset contains segmented images wherein every image has only one character in them. This is what my data looks like. My created model does not perform very well, so I'm now looking for models which I can use via Transfer Learning. Since most models, are already trained on a humongous dataset, I looked over a few like AlexNet, ResNet, GoogLeNet and Inception v2. Most of these models have not been trained on the type of data that I want which would be, Letters and digits.
Question: Should I still go forward with one of these models and train them on my dataset or are there any better models which would help ? For such models would keras be a better option since it is more high level than Tensorflow?
Question: I'd prefer to work with the LeNet model itself since training the other models would definitely take a long time due to the insufficient specs of my laptop. So is there any implementation of the model which uses machine printed character images to train the model which I could use to then train the final layers of the model on my data?
to get good results you should use a model explicitly designed for text recognition.
First, (roughly) crop the input image to the region around the text.
Then, feed the image of the text into a neural network (NN) to detect the text.
A typical NN for text recognition extracts relevant features (with convolutional NN), propagates those features through the image (with recurrent NN) and finally predicts a character score for each position in the image.
Usually, those networks are trained with the CTC loss.
As a starting point I would suggest looking at the CRNN implementation (they also provide a pre-trained model) [1] and the corresponding paper [2]. There is, as far as I remember, also a TensorFlow implementation on github.
You can use any framework (e.g TensorFlow or CNTK or ...) you like as long as it features convolutional and recurrent NN and the CTC loss.
I once attended a presentation about CNTK where they claimed that they have a very fast implementation of recurrent NN - so maybe CNTK would be a good choice for your slow computer?
[1] CRNN implementation: https://github.com/bgshih/crnn
[2] Shi - An End-to-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

Is there a way to recognise an object in an image?

I am looking for some pre-trained deep learning model which can recognise an object in an image. Usually the images are of type used in shopping websites for products. I want to recognise what is the product in the image. I have come across some pre-trained models like VGG, Inception but they seems to be trained on some few general objects like 1000 objects. I am looking for something which is trained on more like 10000 or more.
I think the best way to do this is to build your own training set with the labels that you need to predict, then take an existing pre-trained model like VGG, remove the last fully connected layers and train the mode with your data, the process called transfer learning. Some more info here.

What to expect from deep learning object detection on black and white pictures?

With TensorFlow, I want to train an object detection model with my own images based on ssd_inception_v2_coco model. The problem I have is that all my pictures are black and white. What performance can I expect? Should I try to colorize my B&W pictures first? Or at the opposite, should I try to retrain base network with images "uncolorized"? Are there general guidelines for B&W processing of images for deep learning object detection?
I wouldn't go through the trouble of colorizing if you are planning on using a pretrained model. I would expect that explicitly colorizing your images as a pre-processing step would help very little (if at all) since in theory the features that a colorizing network learns can also be learned by the detection network.
If you are planning on pretraining your detection network that was trained on an RGB dataset, make sure you either (i) replace the first convolution in the network with a convolutional layer that expects a single-channel input, or (ii) pad your image with two all-zero channels.
You may get slightly worse detection performance simply because you lose two thirds of the image's pixel information when using BW instead of RGB.

VGG16 Transfer learning with an additional input source

I am trying to use Tensorflow for transfer learning using a pre-trained VGG16 model.
However, the input to the model in my problem is an RGB image with an extra channel functioning as a binary mask. This is different than the original input on which the model was trained (224x224 RGB images).
I think that using the pretrained model is still possible in this case. How do I assign weights for connections between the first convolutional layer and the extra channel? Is transfer learning still applicable in such a scenario?
Thanks!