Using Lidar images and Camera images to perform object detection - camera

I obtain depth & reflectance maps from Lidar (2D images) and I have also camera images (2D images). Image have the same size.
I want to use CNN to perform object detection using both images. It is a sort of "fusion CNN"
How am I suppose to do it? Did I am suppose to use a pre-train model? But the is no pre-train model using lidar images..
Which is the best CNN algorithm to do it? ie for performing fusion of modalities for object detection
Thanks you in advance

Did I am suppose to use a pre-train model?
Yes you should, unless you are super confident that you can find a working model directly by urself.
But the is no pre-train model using lidar image
First I`m pretty sure there are LIDAR based network .e.g
L Caltagirone , LIDAR-Camera Fusion for Road Detection Using Fully
Convolutional ... arxiv, ‎2018
Second, even if there is no open source implementation for direct LIDAR-based, You can always convert the LIDAR to the depth image. For Depth image based CNN, there are hundreds of implementation for segmentation and detection.
How am I suppose to do it?
First, you can place them side by side parallel, for RGB and depth/LIDAR 3d pointcloud. Feed them separately
Second, you can also combine them by merging the input to 4D tensor and transfer the initial weight to the single model. At last perform transfer learning in your given dataset.
best CNN algorithm?
Totally depends on your task and hardware. Do you need best in processing speed or best in accuracy? Define your "best", please.
ALso Are you using it for autonomous car or for in-house nurse care system? different CNN system customizes the weight for different purposes.
Generally, for real-time multiple object detection using a cheap PC e.g DJI manifold, I would suggest Yolo-tiny

Related

Image Detector with tensorflow

I want to build a simple image detector for custom Binary shapes on images.
I may train and use the models on object detection zoo such as ssd_inception_v2 and so on. But it's would be extremely un efficient as it has sizes in hundreds of Megabytes.
and I can't even imagine to use that in my simple app. can anybody suggest me how to solve this?
I have already built excellent small size classifiers for my images. but can't build small scale efficient detector. (their position with detection boxes)
I think what you need is transfer learning. I would take one of the lightweight models such as MobileNetV2 and retrain on my dataset. It should be pretty quick.If you want to even decrease your model size further, feel free to only take the first few layers of the CNN and retrain it. It would be a bit more work since you need to re-write the part of network you want to use and load it with the pre-trained weights.

Image Detection & Classification - general approach?

I'm trying to build a detection + classification model that will recognize an object in an image and classify it. Every image will contain at most 1 object among my 10 classes (i.e. same image cannot contains 2 classes). An image can, however, contain none of my classes/objects. I'm struggling with the general approach to this problem, especially due to the nature of my problem; my objects have different sizes. This is what I have tried:
Trained a classifier with images that only contains my objects/classes, i.e. every image is the object itself with background pre-removed. Now, since the objects/images have different shapes (aspect ratios) I had to reshape the images to the same size (destroying the aspect ratios). This would work just fine if my purpose was to only build a classifier, but since I also need to detect the objects, this didn't work so good.
The second approach was similar to (1), except that I didn't reshape the objects naively, but kept the aspect ratios by padding the image with 0 (black). This completely destroyed my classifiers ability to perform well (accuracy < 5%).
Mask RCNN - I followed this blogpost to try build a detector + classifier in the same model. The approach took forever and I wasn't sure it was the right approach. I even used external tools (RectLabel) to generate annotated image files containing information about the bounding boxes.
Question:
How should I approach this problem, on a general level:
Should I build 2 separate models? (One for detection/localization and one for classification?)
Should I be annotating my images using annotations file as in approach (3)?
Do I have to reshape my images at any stage?
Thanks,
PS. In all of my approaches, I augmented the images to generate ~500-1000 images per class.
To answer your questions:
No, you don't have to build two separate models. What you are describing is called Object detection, which is classification along with localization. There are many models which do this: Mask_RCNN, Yolo, Detectron, SSD, etc..
Yes, you do need to annotate your images for training a model for your custom classes. Each of the models mentioned above has needs a different way of annotation.
No, you don't need to do any image resizing. Most of the time it is done when the model loads the data for training or inference.
You are on the right track with trying MaskRCNN.
Other than MaskRCNN, you could also try Yolo. There is also an accompanying easy-to-use annotating tool Yolo-Mark.
If you go through this tutorial, you would understand what you care about.
How to train your own Object Detector with TensorFlow’s Object Detector API
The SSD model is small so that it would not take so much time for training.
There are some object detection models.
On RectLabel, you can save bounding boxes in the PASCAL VOC format.
You can export TFRecord for Tensorflow.
https://rectlabel.com/help#tf_record

How to know what Tensorflow actually "see"?

I'm using cnn built by keras(tensorflow) to do visual recognition.
I wonder if there is a way to know what my own tensorflow model "see".
Google had a news showing the cat face in the AI brain.
https://www.smithsonianmag.com/innovation/one-step-closer-to-a-brain-79159265/
Can anybody tell me how to take out the image in my own cnn networks.
For example, what my own cnn model recognize a car?
We have to distinguish between what Tensorflow actually see:
As we go deeper into the network, the feature maps look less like the
original image and more like an abstract representation of it. As you
can see in block3_conv1 the cat is somewhat visible, but after that it
becomes unrecognizable. The reason is that deeper feature maps encode
high level concepts like “cat nose” or “dog ear” while lower level
feature maps detect simple edges and shapes. That’s why deeper feature
maps contain less information about the image and more about the class
of the image. They still encode useful features, but they are less
visually interpretable by us.
and what we can reconstruct from it as a result of some kind of reverse deconvolution (which is not a real math deconvolution in fact) process.
To answer to your real question, there is a lot of good example solution out there, one you can study it with success: Visualizing output of convolutional layer in tensorflow.
When you are building a model to perform visual recognition, you actually give it similar kinds of labelled data or pictures in this case to it to recognize so that it can modify its weights according to the training data. If you wish to build a model that can recognize a car, you have to perform training on a large train data containing labelled pictures. This type of recognition is basically a categorical recognition.
You can experiment with the MNIST dataset which provides with a dataset of pictures of digits for image recognition.

What to expect from deep learning object detection on black and white pictures?

With TensorFlow, I want to train an object detection model with my own images based on ssd_inception_v2_coco model. The problem I have is that all my pictures are black and white. What performance can I expect? Should I try to colorize my B&W pictures first? Or at the opposite, should I try to retrain base network with images "uncolorized"? Are there general guidelines for B&W processing of images for deep learning object detection?
I wouldn't go through the trouble of colorizing if you are planning on using a pretrained model. I would expect that explicitly colorizing your images as a pre-processing step would help very little (if at all) since in theory the features that a colorizing network learns can also be learned by the detection network.
If you are planning on pretraining your detection network that was trained on an RGB dataset, make sure you either (i) replace the first convolution in the network with a convolutional layer that expects a single-channel input, or (ii) pad your image with two all-zero channels.
You may get slightly worse detection performance simply because you lose two thirds of the image's pixel information when using BW instead of RGB.

how to make an object detector from an image classifier?

I have a tensorflow model (retrained inception model) which can classify 5 classes of vehicles. Now i need to make an object detector for all these 5 classes with this trained model. Can it be done by removing the last layer ? can any one suggest me how to proceed further
If you really need to use your pretrained network, then you can detect potential boxes of interest then apply your network on each. These boxes can be determined with an "objectness" method, such as EdgeBox.
However, on nowadays, object detection is usually obtained by a more integrated way, such those obtained with faster RCNN. Such an approach integrates a layer named Region Proposal Network (RPN), that determine the region of interest, jointly with the recognition of the classes.
to the best of my knowledge, one of the best recent approaches is Yolo, but it is natively based on Darknet.