How do we project from camera to lidar coordinate when both of the sensors share same coordinate systems? - object-detection

I am working on object detection task. I am able to detect objects in kitti point cloud. I am trying to use the same code on my own point cloud dataset. In Kitti dataset the camera and lidar sensor share different coordinate systems. I have attached image here for reference. For camera the axis are (z,x,y) and for lidar the axis are (x,y,z).
For KITTI dataset they have also provided calibration information. I am able to understand the projection matrix for KITTI dataset.I went through few materials.
Camera-Lidar Projection.
In the above link he has calculated projection matrix as:
R_ref2rect_inv = np.linalg.inv(R_ref2rect)
P_cam_ref2velo = np.linalg.inv(velo2cam_ref)
proj_mat = R_ref2rect_inv # P_cam_ref2velo
My question is:
For my dataset the sensor setup is almost same so we can say lidar and camera share the same coordinate system. So, in this case how do I project point from camera to lidar?
In other words,
If my object center is (x=7.5, y=1.7, z=0.58) in camera coordinate system then how do I find the same point in Lidar pointcloud?

Related

Dose data augmentation always help in neural network training for 3D object detection?

Recently, a transformer-based model has been used to process 3D point cloud for 3D point annotations for 3D object detection. Found the data augmentations (e.g., random shift, scale, and mirroring along axis Y) hinder the performance of pseudo labels generation on the training data loader. Is this phenomenon normal? Any theories that can explain or support this?

Vehicle Detection using CNN

Working on a vehicle detection project. Basically I have two sensors; Radar and Camera. I would be figuring the Region of Interest in the camera image data through Radar features, and try examining if that region contains a vehicle through ML prediction. Planning to train my model from the available resources online like:
-https://cogcomp.seas.upenn.edu/Data/Car/
-http://ai.stanford.edu/~jkrause/cars/car_dataset.html
I was reviewing a few papers, and some stackex feeds. All of them majorly supports CNN as the approach to opt for.
My query is: -The training dataset images are of 64,64 dimension with two labels(Vehicle and Non-Vehicle) (Example- Non-Vehicle Image Vehicle Image). Although my input image could be any scaled image such as( Region predicted to be Car from Radar Sensor) can we model a DNN to take to predict the exact region of the car from that image?
Apologies if the logic of the question is bad, I'm completely new to DL AND ML.

Indoor point cloud instance segmentation training

I'm having some basic trouble understanding how some of the Neural Networks for point cloud instance segmentation are implemented. For instance, some networks trained and tested on the Stanford Indoor dataset are trained on the whole indoor scene annotated with different objects and then during test when given another indoor scene, the networks produce a instance segmented point cloud.
My question is, what if I have a dataset containing all the objects that can be found in my test scene as point clouds and I train the network on this dataset. To be clear, I don't have a scene annotated with different classes like the Standford dataset. I only have objects as point clouds without any background details.
While testing I give it a scene. Can the networks detect and segment the test scene point cloud to recognise only the objects it was trained for and the rest of the scene understanding is not that import for my use case.
It would be really helpful if someone could tell me what I'm not understanding properly.

NuScenes Dataset : How to project lidar pointcloud into camera image plane?

I need to use a multimodal dataset in order to detect objects for an autonomous vehicle project.
The NuScenes Dataset just came with a lot of data : https://www.nuscenes.org/. Did anyone know how to project the lidar pointcloud into camera image plane with projection matrix just like for the Kitti Dataset?
Have a look at the official nuscenes SDK and the function map_pointcloud_to_image.
I believe, it does exactly what you have in mind.
https://github.com/nutonomy/nuscenes-devkit/blob/master/python-sdk/nuscenes/nuscenes.py
You can call the map_pointcloud_to_image function by creating an instance of NuSceneseExplorer.
from nuscenes import NuScenesExplorer

Using Lidar images and Camera images to perform object detection

I obtain depth & reflectance maps from Lidar (2D images) and I have also camera images (2D images). Image have the same size.
I want to use CNN to perform object detection using both images. It is a sort of "fusion CNN"
How am I suppose to do it? Did I am suppose to use a pre-train model? But the is no pre-train model using lidar images..
Which is the best CNN algorithm to do it? ie for performing fusion of modalities for object detection
Thanks you in advance
Did I am suppose to use a pre-train model?
Yes you should, unless you are super confident that you can find a working model directly by urself.
But the is no pre-train model using lidar image
First I`m pretty sure there are LIDAR based network .e.g
L Caltagirone , LIDAR-Camera Fusion for Road Detection Using Fully
Convolutional ... arxiv, ‎2018
Second, even if there is no open source implementation for direct LIDAR-based, You can always convert the LIDAR to the depth image. For Depth image based CNN, there are hundreds of implementation for segmentation and detection.
How am I suppose to do it?
First, you can place them side by side parallel, for RGB and depth/LIDAR 3d pointcloud. Feed them separately
Second, you can also combine them by merging the input to 4D tensor and transfer the initial weight to the single model. At last perform transfer learning in your given dataset.
best CNN algorithm?
Totally depends on your task and hardware. Do you need best in processing speed or best in accuracy? Define your "best", please.
ALso Are you using it for autonomous car or for in-house nurse care system? different CNN system customizes the weight for different purposes.
Generally, for real-time multiple object detection using a cheap PC e.g DJI manifold, I would suggest Yolo-tiny