Is there any way to train and detect objects from lidar data? - object-detection

I am a beginner to machine learning , but I've been using yolov5 and yolov7 to train and detect objects from images. Trainings was done by annotating the objects using various annotation tools such as roboflow etc. I now wanted to enter the domain of lidar and point cloud data. I was wondering if there was such a way to annotate the objects from lidar, and then train and save objects for future detection of objects from point cloud data. Is there anyway this can be done using such algorithms or any other ways?

Related

Image Detection & Classification - general approach?

I'm trying to build a detection + classification model that will recognize an object in an image and classify it. Every image will contain at most 1 object among my 10 classes (i.e. same image cannot contains 2 classes). An image can, however, contain none of my classes/objects. I'm struggling with the general approach to this problem, especially due to the nature of my problem; my objects have different sizes. This is what I have tried:
Trained a classifier with images that only contains my objects/classes, i.e. every image is the object itself with background pre-removed. Now, since the objects/images have different shapes (aspect ratios) I had to reshape the images to the same size (destroying the aspect ratios). This would work just fine if my purpose was to only build a classifier, but since I also need to detect the objects, this didn't work so good.
The second approach was similar to (1), except that I didn't reshape the objects naively, but kept the aspect ratios by padding the image with 0 (black). This completely destroyed my classifiers ability to perform well (accuracy < 5%).
Mask RCNN - I followed this blogpost to try build a detector + classifier in the same model. The approach took forever and I wasn't sure it was the right approach. I even used external tools (RectLabel) to generate annotated image files containing information about the bounding boxes.
Question:
How should I approach this problem, on a general level:
Should I build 2 separate models? (One for detection/localization and one for classification?)
Should I be annotating my images using annotations file as in approach (3)?
Do I have to reshape my images at any stage?
Thanks,
PS. In all of my approaches, I augmented the images to generate ~500-1000 images per class.
To answer your questions:
No, you don't have to build two separate models. What you are describing is called Object detection, which is classification along with localization. There are many models which do this: Mask_RCNN, Yolo, Detectron, SSD, etc..
Yes, you do need to annotate your images for training a model for your custom classes. Each of the models mentioned above has needs a different way of annotation.
No, you don't need to do any image resizing. Most of the time it is done when the model loads the data for training or inference.
You are on the right track with trying MaskRCNN.
Other than MaskRCNN, you could also try Yolo. There is also an accompanying easy-to-use annotating tool Yolo-Mark.
If you go through this tutorial, you would understand what you care about.
How to train your own Object Detector with TensorFlow’s Object Detector API
The SSD model is small so that it would not take so much time for training.
There are some object detection models.
On RectLabel, you can save bounding boxes in the PASCAL VOC format.
You can export TFRecord for Tensorflow.
https://rectlabel.com/help#tf_record

Tensorflow object detection api: how to use imgaug for augmentation?

I've been hand-rolling augmenters using imgaug, as I really like some of the options that are not available in the tf object detection api. For instance, I use motion blur because so much of my data has fast-moving, blurry objects.
How can I best integrate my augmentation sequence with the api for on-the-fly training?
E.g., say I have an augmenter:
aug = iaa.SomeOf((0, 2),
[iaa.Fliplr(0.5), iaa.Flipud(0.5), iaa.Affine(rotate=(-10, 10))])
Is there some way to configure the object detection api to work with this?
What I am currently doing is using imgaug to generate (augmented) training data, and then creating tfrecord files from each iteration of this augmentation pipeline. This is very inefficient as I am saving large amounts of data to disk rather than running augmentation on the fly, during training.
Someone has made a repo for this:
https://github.com/JinLuckyboy/TensorFlowObjectDetectionAPI-with-imgaug
Sorry this is not a code answer and I have not actually looked into it, so I will not mark this as officially answered. If I ever get a chance to test it I will let people know.

Using Lidar images and Camera images to perform object detection

I obtain depth & reflectance maps from Lidar (2D images) and I have also camera images (2D images). Image have the same size.
I want to use CNN to perform object detection using both images. It is a sort of "fusion CNN"
How am I suppose to do it? Did I am suppose to use a pre-train model? But the is no pre-train model using lidar images..
Which is the best CNN algorithm to do it? ie for performing fusion of modalities for object detection
Thanks you in advance
Did I am suppose to use a pre-train model?
Yes you should, unless you are super confident that you can find a working model directly by urself.
But the is no pre-train model using lidar image
First I`m pretty sure there are LIDAR based network .e.g
L Caltagirone , LIDAR-Camera Fusion for Road Detection Using Fully
Convolutional ... arxiv, ‎2018
Second, even if there is no open source implementation for direct LIDAR-based, You can always convert the LIDAR to the depth image. For Depth image based CNN, there are hundreds of implementation for segmentation and detection.
How am I suppose to do it?
First, you can place them side by side parallel, for RGB and depth/LIDAR 3d pointcloud. Feed them separately
Second, you can also combine them by merging the input to 4D tensor and transfer the initial weight to the single model. At last perform transfer learning in your given dataset.
best CNN algorithm?
Totally depends on your task and hardware. Do you need best in processing speed or best in accuracy? Define your "best", please.
ALso Are you using it for autonomous car or for in-house nurse care system? different CNN system customizes the weight for different purposes.
Generally, for real-time multiple object detection using a cheap PC e.g DJI manifold, I would suggest Yolo-tiny

how to use tensorflow object detection API for face detection

Open CV provides a simple API to detect and extract faces from given images. ( I do not think it works perfectly fine though because I experienced that it cuts frames from the input pictures that have nothing to do with face images. )
I wonder if tensorflow API can be used for face detection. I failed finding relevant information but hoping that maybe an experienced person in the field can guide me on this subject. Can tensorflow's object detection API be used for face detection as well in the same way as Open CV does? (I mean, you just call the API function and it gives you the face image from the given input image.)
You can, but some work is needed.
First, take a look at the object detection README. There are some useful articles you should follow. Specifically: (1) Configuring an object detection pipeline, (3) Preparing inputs and (3) Running locally. You should start with an existing architecture with a pre-trained model. Pretrained models can be found in Model Zoo, and their corresponding configuration files can be found here.
The most common pre-trained models in Model Zoo are on COCO dataset. Unfortunately this dataset doesn't contain face as a class (but does contain person).
Instead, you can start with a pre-trained model on Open Images, such as faster_rcnn_inception_resnet_v2_atrous_oid, which does contain face as a class.
Note that this model is larger and slower than common architectures used on COCO dataset, such as SSDLite over MobileNetV1/V2. This is because Open Images has a lot more classes than COCO, and therefore a well working model need to be much more expressive in order to be able to distinguish between the large amount of classes and localizing them correctly.
Since you only want face detection, you can try the following two options:
If you're okay with a slower model which will probably result in better performance, start with faster_rcnn_inception_resnet_v2_atrous_oid, and you can only slightly fine-tune the model on the single class of face.
If you want a faster model, you should probably start with something like SSDLite-MobileNetV2 pre-trained on COCO, but then fine-tune it on the class of face from a different dataset, such as your own or the face subset of Open Images.
Note that the fact that the pre-trained model isn't trained on faces doesn't mean you can't fine-tune it to be, but rather that it might take more fine-tuning than a pre-trained model which was pre-trained on faces as well.
just increase the shape of the input, I tried and it's work much better

how to make an object detector from an image classifier?

I have a tensorflow model (retrained inception model) which can classify 5 classes of vehicles. Now i need to make an object detector for all these 5 classes with this trained model. Can it be done by removing the last layer ? can any one suggest me how to proceed further
If you really need to use your pretrained network, then you can detect potential boxes of interest then apply your network on each. These boxes can be determined with an "objectness" method, such as EdgeBox.
However, on nowadays, object detection is usually obtained by a more integrated way, such those obtained with faster RCNN. Such an approach integrates a layer named Region Proposal Network (RPN), that determine the region of interest, jointly with the recognition of the classes.
to the best of my knowledge, one of the best recent approaches is Yolo, but it is natively based on Darknet.