Detecting Any Object in my path using Tensorflow

Detecting Any Object in my path using Tensorflow - tensorflow

Can I Use Tensorflow object detection API for detecting any objects which come in between my path so that can stop the movement of my product? I have done customized Object detections before but here I can't train each object which may interrupt my product path. So is that possible to use Tensorflow API as a kind of collision detection?

With object detection, you can identify objects and the object's location and extent on an image. This would be an option to check if specific objects are blocking your path. There is also the option of detecting/segmenting unknown objects (as described here). However, what you are after sounds more like depth estimation or even SLAM.
One example for depth estimation is monodepth - a neural network that can estimate the depth for each pixel from a single camera image. You can use that to verify if your path is clear or if something in front of your product is blocking the path.
The other one SLAM - simultaneous location and mapping - might be a bit over the top for just checking if you can drive somewhere. Anyways, SLAM solves the task of navigating an unknown environment by building an internal model of the world and at the same time estimates the own location inside this model to solve navigation tasks.

Related

how to overcome tracking jitter

I'm working with a object tracking project.
Steps:
1.Preprocessing the image and achieve some candidates regions of interest.
2.For each region, test if it is the target by ORB/BF.
3.After the target region determined, acquire coordinates of some points on the target and their corresponding coordinates in the world coordinate system.
4.Use solvePnP(in opencv) to get rotation vector and translation vector.
5.Translation vector is used in VR for localization and view control.
Tracking jitter means, although the object is stationary, because of some tracking errors, such as noise, the position of the target is slightly changing. Then, look at step 4 and step 5, due to the change, translation vector is slightly changed and with the Head Mounted Device, I feel the jitter all the time.
Seems to me that tracking jitter is unavoidable because of change in environment or some noise. But one pixel value change can lead to about a few centimeters change in z value in translation vector. So any proper way to deal with it?
I have googled but there didn't seem much information.Effects of Tracking Technology, Latency, and Spatial Jitter on Object Movement mentions the phemomenon, but did not provide a solution. Another interesting paper is Motion Tracking Requirements and Technologies. So can anyone offer some useful information?
It occurs to me that fileter is needed to do some post processing to the tracking data. But the idea is not very idea. Kalman filter can be used for tracking and can be used to attenuate noise. I don't know whether it can compensate for this kind of jitter(I mean, very small fluctuation in values) very well. And investigate how to incorporate Kalman filter into this project is another topic and need extra time.

What are the differences between “instance detection” and "semantic segmentation"?

I am working on semantic segmentation using deep learning, and I have met the terms: semantic segmentation, instance detection, object detection and object segmentation.
What is the differences between them?

Some of the usage of these terms is either subjective to the user or context-dependent, but as far as I can tell a plausible reading of these can be:
instance detection - given an instance (i.e. an image of a specific object) you need to detect it in an image / image set. Result can be either "Image i has instance X", a segmentation of the instance in all of its occurrences or anything in between.
object detection - depending on context can be the same as instance detection, or could mean that given a specific class of objects you want to detect all objects of this class that occur in an image / image set
object segmentation - take object detection and add segmentation of the object in the images it occurs in.
semantic segmentation - attempt to segment given image(s) into semantically interesting parts. This usually means pixel-labeling to a predefined class list.
Another question about image segmentation terminology can be found here and might be of some interest for you.

Rendering multiple models bug on DirectX 12

I am trying to render multiple models on DirectX 12 using only one graphic context, but the result is very weird and I have not much idea what is the reason. Rendering result of the sponza model from outside, the one on right is the correct result and the one on left has problem.
Rendering result of the left sponza (the one has problem) from inside.
Even the loaded two meshes are the same, each model has its own vertex buffer, index buffer and SRVs. In the process of creating graphics context, there is only one graphics context and set with each model's index and vertex buffer, and then I call the drawIndexed() function to render it. After the graphics context is created, we execute the graphics context once per frame. However, if we create an individual graphics context to each model and execute all graphics contexts per frame, the rendering works fine but the frame rate drops a lot.
It will be very helpful for you to provide any hints about what is the reason for the weird result, or providing a solution is even better. Thank you very much in advanced.

First, i would recommend you to stay away from dx12 and stick to dx11, unless you are a dx11 expert already and that you are the top 1% application case, like triple A games or very specific high demand on control over the gpu memory.
Without much details on your problems here, i can only give you a few basic advices :
Run with the debug layer and look at the console log with D3D12GetDebugInterface ( you will need to install the optional feature named graphic tools )
Use frame capture tools, like VSGD in visual studio or nsight from nVidia and inspect your frame step by step
Use Dx11, really

It is possible to recognize all objects from a room with Microsoft Kinect?

I have a project where I have to recognize an entire room so I can calculate the distances between objects (like big ones eg. bed, table, etc.) and a person in that room. It is possible something like that using Microsoft Kinect?
Thank you!

Kinect provides you following
Depth Stream
Color Stream
Skeleton information
Its up to you how you use this data.
To answer your question - Official Micorosft Kinect SDK doesnt provides shape detection out of the box. But it does provide you skeleton data/face tracking with which you can detect distance of user from kinect.
Also with mapping color stream to depth stream you can detect how far a particular pixel is from kinect. In your implementation if you have unique characteristics of different objects like color,shape and size you can probably detect them and also detect the distance.
OpenCV is one of the library that i use for computer vision etc.
Again its up to you how you use this data.

Kinect camera provides depth and consequently 3D information (point cloud) about matte objects in the range 0.5-10 meters. With this information it is possible to segment out the floor (by fitting a plane) of the room and possibly walls and the ceiling. This step is important since these surfaces often connect separate objects making them a one big object.
The remaining parts of point cloud can be segmented by depth if they don't touch each other physically. Using color one can separate the objects even further. Note that we implicitly define an object as 3D dense and color consistent entity while other definitions are also possible.
As soon as you have your objects segmented you can measure the distances between your segments, analyse their shape, recognize artifacts or humans, etc. To the best of my knowledge however a Skeleton library can recognize humans after they moved for a few seconds. Below is a simple depth map that was broken on a few segments using depth but not color information.

Transformation between point clouds

I hope to find some hints where to start with a problem I am dealing with.
I am using a Kinect sensor to capture 3d point clouds. I created a 3d object detector which is already working.
Here my task:
Lets say I have a point cloud 1. I detected a object in cloud A and I know the centroid position of my object (x1,y1,z1). Now I move my sensor around a path and create new clouds (e.g. cloud 2). In that cloud 2 I see the same object but e.g. from the side, where the object detection is not working fine.
I would like to transform the detected object form cloud 1 to cloud 2, to get the centroid also in cloud 2. For me it sound like I need a matrix (Translation, Rotation) to transform point from 1 to 2.
And ideas how I could solve my problem?
Maybe ICP? Are there better solutions?
THX!

In general, this task is called registration. It relies on having a good estimation of which points in cloud 1 correspond to which clouds in point 2 (more specifically, which given a point in cloud 1, which point in cloud 2 represents the same location on the detected object). There's a good overview in the PCL library documentation
If you have such a correspondence, you're in luck and you can directly compute a rotation and translation as demonstrated here.
If not, you'll need to estimate that correspondence. ICP does that for approximately aligned point clouds, but if your point clouds are not already fairly well aligned, you may want to start by estimating "key points" (such as book corners, distinct colors, etc) in your point clouds, computing a rotation and translation as above, and then performing ICP. As D.J.Duff mentioned, ICP works better in practice on point clouds that are already approximately aligned because it estimates correspondences using one of two metrics, minimal point-to-point distance or minimal point to plane distance, according to wikipedia, the latter works better in practice, but it does involve estimating normals, which can be tricky. If the correspondences are far off, the transforms likely will be as well.

I think what you were asking about was in particular to the Kinect Sensor and the API Microsoft released for it.
If you are not planning to do reconstruction, you can look into the AlignPointClouds function in Sensor Fusion namespace. This should take care of it automatically, in methods similar to the answer given by #pnhgiol.
On the other hand, if you are looking at doing reconstruction as well as point cloud transforms, the Reconstruction class is what you are looking for. All of which can be found out about, here.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas