I need to use a multimodal dataset in order to detect objects for an autonomous vehicle project.
The NuScenes Dataset just came with a lot of data : https://www.nuscenes.org/. Did anyone know how to project the lidar pointcloud into camera image plane with projection matrix just like for the Kitti Dataset?
Have a look at the official nuscenes SDK and the function map_pointcloud_to_image.
I believe, it does exactly what you have in mind.
https://github.com/nutonomy/nuscenes-devkit/blob/master/python-sdk/nuscenes/nuscenes.py
You can call the map_pointcloud_to_image function by creating an instance of NuSceneseExplorer.
from nuscenes import NuScenesExplorer
Related
I am working on object detection task. I am able to detect objects in kitti point cloud. I am trying to use the same code on my own point cloud dataset. In Kitti dataset the camera and lidar sensor share different coordinate systems. I have attached image here for reference. For camera the axis are (z,x,y) and for lidar the axis are (x,y,z).
For KITTI dataset they have also provided calibration information. I am able to understand the projection matrix for KITTI dataset.I went through few materials.
Camera-Lidar Projection.
In the above link he has calculated projection matrix as:
R_ref2rect_inv = np.linalg.inv(R_ref2rect)
P_cam_ref2velo = np.linalg.inv(velo2cam_ref)
proj_mat = R_ref2rect_inv # P_cam_ref2velo
My question is:
For my dataset the sensor setup is almost same so we can say lidar and camera share the same coordinate system. So, in this case how do I project point from camera to lidar?
In other words,
If my object center is (x=7.5, y=1.7, z=0.58) in camera coordinate system then how do I find the same point in Lidar pointcloud?
Data augmentation can easily be achieved using ad hoc modules in e.g. TensorFlow. This works perfectly for classification problems, however when the objective of the network is the prediction of a geometrical feature, e.g. a landmark, a problem arises. As the image is modified, e.g. flipped, or distorted, the corresponding labels also need to be adapted.
1 - Is there any tool to do this? I am sure that this is a common problem.
2 - Would it be useful to create a data augmentation script for neural networks that predict geometrical features?
I want to understand if I need to code all of this by myself or if I am missing something that already exists. If I need to do it and it could be useful I would just create an open source thing.
You can use imgaug library https://github.com/aleju/imgaug
An example of augmentation for key points using imgaug you can find here https://github.com/aleju/imgaug#example-augment-images-and-keypoints
I am using the tensorflow object detection API for the object detection task. However, I have objects that are captured from a high angle (camera at 10 m) and in a very small size where the size of images is 1920 x 1080.
Questions:
1) What is the best way to detect small objects under this condition?
2) What are the features of suitable dataset? Images from the same views (maybe!)?
I appreciate all of your answers, Thanks :)
You have to consider object detector's input size, even if you use high resolution image such as 1920x1080.
Because object detector resize input image to their architecture size(ex. general YOLO use 410x410 input in their architecture)
On the other hand, if you use 1920x1080 image as it is, your API will resize it to small resolution like 410x410.
It means your small objects in images will be disappeared during passing through convolution filter.
In my opinion,
1) If you know where small objects is located in whole image, CROP&SEPARATE image and USE as an input image.
Even though you do not know where small objects is, you can make several candidate list that is separated by some method.
2) I don't understand what you want to know, please let me know more specific.
I think you should try "faster_rcnn_resnet101" model with kitti dataset, this has the max image size of 1987. but this model is very slow compared to any other SSD models. The configuration link is below -
https://github.com/tensorflow/models/blob/001a2a61285e378fef5f45386f638cb5e9f153c7/research/object_detection/samples/configs/faster_rcnn_resnet101_kitti.config
Also the Faster rcnn models do better job compared to yolo in small object detection, not sure of performance with ssd model.
I obtain depth & reflectance maps from Lidar (2D images) and I have also camera images (2D images). Image have the same size.
I want to use CNN to perform object detection using both images. It is a sort of "fusion CNN"
How am I suppose to do it? Did I am suppose to use a pre-train model? But the is no pre-train model using lidar images..
Which is the best CNN algorithm to do it? ie for performing fusion of modalities for object detection
Thanks you in advance
Did I am suppose to use a pre-train model?
Yes you should, unless you are super confident that you can find a working model directly by urself.
But the is no pre-train model using lidar image
First I`m pretty sure there are LIDAR based network .e.g
L Caltagirone , LIDAR-Camera Fusion for Road Detection Using Fully
Convolutional ... arxiv, 2018
Second, even if there is no open source implementation for direct LIDAR-based, You can always convert the LIDAR to the depth image. For Depth image based CNN, there are hundreds of implementation for segmentation and detection.
How am I suppose to do it?
First, you can place them side by side parallel, for RGB and depth/LIDAR 3d pointcloud. Feed them separately
Second, you can also combine them by merging the input to 4D tensor and transfer the initial weight to the single model. At last perform transfer learning in your given dataset.
best CNN algorithm?
Totally depends on your task and hardware. Do you need best in processing speed or best in accuracy? Define your "best", please.
ALso Are you using it for autonomous car or for in-house nurse care system? different CNN system customizes the weight for different purposes.
Generally, for real-time multiple object detection using a cheap PC e.g DJI manifold, I would suggest Yolo-tiny
I'm about to start developing a neural net here with Tensorflow, but before I get into it too deep, I was hoping I could get some feedback on exactly what type of neural net I will need for this (If a net is the right way to go about this at all)
I need the NN to input an image, and output another image. This will be used for path-mapping on a robot I'm working on. The input image will be a disparity map, and the output will be a "driveable map" (an image that displays what in the scene can be driven on, and what can't)
I have built a dataset using Unity 3d. Here is an example from the set:
disparity map
driveable map:
As you can probably see, white represents the area where my robot can drive and black is where it can't. I will need the NN to take a disparity map, and give me back a "driveable map". Can this be done?
Thanks!
Sorry I'm not an expert. Since there hasn't been a response on this and if you are still looking, the vocabulary I would use to describe this type of problem is disparity networks and segmentation. Your best bet may be a specific type of disparity network: U-net