I am currently working on a Mediapipe Pose Estimation for a fitness project to use across different mobile devices. We had been using the Mediapipe AAR full version. What I had found for example, is that on a Samsung S21, the elbow point will be estimated to the back of the elbow whereas on the iPhone, it would be estimated more towards the middle of the elbow. This would have a significant impact on our ability to give angle feedback to users and make it difficult to reference.
Is there any step or setting that I should do so that I can get more consistent results across different devices? Would it be better to use MLKit Mediapipe version? Or would the only way be to use an external camera to standardize the results.
Thank you.
Related
I'm in the process of buying a 7.5 acre plot of land in a wooded, hilly area. I would estimate that the elevation varies about 50 feet from the bottom of the creek to the top of the hill. I would like to find a good method for measuring the topography of the land so I can create a 3D model. It would be tremendously useful to be able to try out different land development ideas and to simulate locations for future buildings.
My low-tech version of doing this would be to set up a laser level and go around taking elevation measurements in a 3' or so grid pattern. As I thought about that, I realized that smartphones and similar devices have quite a few sensors built in that might make this a lot easier.
I learned about software that will use a drone to capture data and images to automatically generate a topo map and 3D model. Drone Deploy is one such tool. I do have a DJI Phantom 4, but I don't know if it's feasible to fly such an intricate path among trees to scan the entire property. I wonder if there's another way to use this amazing modern hardware (phone or drone) to make my task easy.
I would appreciate hearing any thoughts and ideas about this!
The thing with dronedeploy is that you fly above the trees usually 30meters is ok. In a cross pattern.
Why do you want to fly between the trees? You have to explain that first.
I want to create a application which converts 2d-images/video into a 3d model. While researching on it i found out similar application like Trnio, Scann3D, Qlone,and few others(Though few of them provide poor output 3D model). I also find out about a technology launched by the microsoft research called mobileFusion which showed the same vision i was hoping for my application but these apps were non like that.
Creating a 3D modelling app is complex task, and achieving it to a high standard requires a lot of studying. To point you in the right direction, you most likely want to perform something called Structure-from-Motion(SfM) or Simultaneous Localization and Mapping (SLAM).
If you want to program this yourself OpenCV is a good place to start if you know C++ or Python. A typical pipeline involves; feature extraction and matching, camera pose estimation, triangulation and then optimised using a bundle adjustment. All pipelines for SfM and SLAM follow these general steps (with exceptions of course). All of these steps are possible is OpenCV although Googles Ceres Solver is an excellent open-source bundle adjustment. SfM generally goes onto dense matching which is where you get very dense point clouds which are good for creating meshes. A free open-source pipeline for this is OpenSfM. Another good source for tools is OpenMVG which has all of the tools you need to make a full pipeline.
SLAM is similar to SfM, however, has more of a focus on real-time application and less on absolute accuracy. Applications for this is more centred around robotics where a robot wants to know where it is relative to its environment, but it not so concerned on absolute accuracy. The top SLAM algorithms are ORB-SLAM and LSD-SLAM. Both are open-source and free for you to implement into your own software.
So really it depends what you want... SfM for high accuracy, SLAM for real-time. If you want a good 3D model I would recommend using existing algorithms as they are very good.
The best commercial software in my opinion... Agisoft Photoscan. If you can make anything half as good as this i'd be very impressed. To answer your question what resources will you require. In my opinion, python/c++ skills, the ability to google well and a spare time to read up on photogrammetry and SfM properly.
I am working with Kinect and I implemented a skeleton tracking mechanism to keep joint trajectories of a sequence of human gait with single kinect. The problem is with the accuracy of the mesurments. I want to overcome occlusion and jitter problems. I found some filter implementations to deal with jittering (more advanced than average filters). But occlusion is more difficult. Is there an good open source project for skeleton tracking with good results (more accurate). Microsoft sdk or openni it doesn't matter.
Thanks in advance
Actually I did a gait recognition using Kinect and I used microsoft official sdk. The accuracy of measurements is based on your calculation. Check this as example.
How can I track fast hand movement using kinect?
I've tried both Openni and Microsoft sdk to track hand. On both of them, there are lots of jitters and inaccurate movement of joints.
Here is an example video of kinect fruit ninja: Example Video
On that video, there are no jitters and inaccuracy and also it's tracking the fast hand movements.
What am I missing? Is there any kinds of kinect hardware versions or types which I should look into.
My best guess is that Fruit Ninja applies some sort of smoothing at some point. What you're seeing in that video is almost certainly not the raw data they're getting from the Kinect. The data from the Kinect will always have some kind of jitter; real-world sensor data almost always does. You'll need to smooth it - exactly how to do that depends on the application; it could be something simple like modelling a kind of damping and/or inertia on the point that's being moved by the hand (which is what I suspect Fruit Ninja is doing), or you could look at something like a Kalman filter for a robust (but more computationally-intensive) way to reduce the noise in your sensor readings.
I want to evaluate the performance of several SDKs / frameworks for depth cameras. These cameras can either be using Time-of-Flight or structured light.
The framework should be capable (at least) of person tracking / blob detection and gesture recognition.
So far I found the following frameworks:
OpenNI (structured light only)
Microsoft Kinect SDK (Kinect only)
Beckon SDK by Omek Interactive (ToF and structured light)
iisu by SoftKinetic (ToF and structured light)
Are there any other frameworks I should be aware of?
EDIT: I found this article by Techradar that seems to indicate that these are indeed the only options currently available.
Any feedback would be very much appreciated!
I have found some interesting links on this. You can take MIT's approach using CodAC . They list lots of facts on this post, the most important ones I will post here.
9. What are limitations of this technique?
The main limitation of our framework is inapplicability to scenes with curvilinear
objects, which would require extensions of the current mathematical model.
Another limitation is that a periodic light source creates a wrap-around error
as it does in other TOF devices. For scenes in which surfaces have high reflectance
or texture variations, availability of a traditional 2D image prior to our data
acquisition allows for improved depth map reconstruction as discussed in our paper.
10. What are advantages of this technique/device and how does it
compare with existing TOF-based range sensing techniques?
In laser scanning, spatial resolution is limited by the scanning time.
TOF cameras do not provide high spatial resolution because they rely on a
low-resolution 2D pixel array of range-sensing pixels. CoDAC is a single-sensor,
high spatial resolution depth camera which works by exploiting the sparsity of natural
scene structure.
11. What is the range resolution and spatial resolution of the CoDAC system?
We have demonstrated sub-centimeter range resolution in our experiments.
This is significantly better than fundamental limit of about 10 cm that would
arise from using a detector with 0.7 nanosecond rise time if we were not using
parametric signal modeling. The improvement in range resolution comes from the
parametric modeling and deconvolution in our framework. We refer the reader to
our publications for complete details and analysis.
We have demonstrated 64-by-64 pixel spatial resolution,
as this is the spatial resolution of our spatial light modulator.
Spatially patterning with a digital micromirror device (DMD) will enable
much higher spatial resolution. Our experiments use only 205 projection patterns,
which correspond to just 5% of number of pixels in the reconstructed depth map.
This is a significant improvement over raster scanning in LIDAR, and it is
obtained without the 2D sensor array used in TOF cameras.
Also another interesting project I found on Youtube uses libfreenect and libusb
There is also dSensingNI which is described as
This work presents an approach to overcome the disadvantages of existing interaction
frameworks and technologies for touch detection and object interaction. The robust and
easy to use framework dSensingNI (Depth Sensing Natural Interaction) is described,
which supports multitouch and tangible interaction with arbitrary objects. It uses
images from a depth-sensing camera and provides tracking of users fingers of palm of
hands and combines this with object interaction, such as grasping, grouping and
stacking, which can be used for advanced interaction techniques.
So you have hit most of them out there, especially that use Kinect, but there are a few other options out there! Hope this Helps!