How to detect Multi-Person Pose Estimation with Mediapipe? - mediapipe

I want to use mediapipe for pose estimation for multiple person in video footage. But using mediapipe I cant do it for multiple person. Any solution for this problem?

Related

How do I improve Mediapipe Pose Estimation landmark consistency across devices

I am currently working on a Mediapipe Pose Estimation for a fitness project to use across different mobile devices. We had been using the Mediapipe AAR full version. What I had found for example, is that on a Samsung S21, the elbow point will be estimated to the back of the elbow whereas on the iPhone, it would be estimated more towards the middle of the elbow. This would have a significant impact on our ability to give angle feedback to users and make it difficult to reference.
Is there any step or setting that I should do so that I can get more consistent results across different devices? Would it be better to use MLKit Mediapipe version? Or would the only way be to use an external camera to standardize the results.
Thank you.

Tensorflow: how to detect audio direction

I have a task: to determine the sound source location.
I had some experience working with tensorflow, creating predictions on some simple features and datasets. I assume that for this task, there would be necessary to analyze the sound frequences and probably other related data on training and then prediction steps. The sound goes from the headset, so human ear is able to detect the direction.
1) Did somebody already perform that? (unfortunately couldn't find any similar project)
2) What kind of caveats could I meet while trying to achieve that?
3) Am I able to do that using this technology approach? Are there any other sound processing frameworks / technologies / open source projects that could help me ?
I am asking that here, since my research on google, github, stackoverflow didn't show me any relevant results on that specific topic, so any help is highly appreciated!
This is typically done with more traditional DSP with multiple sensors. You might want to look into time difference of arrival(TDOA) and direction of arrival(DOA). Algorithms such as GCC-PHAT and MUSIC will be helpful.
Issues that you might encounter are: DOA accuracy is function of the direct to reverberant ratio of the source, i.e. the more reverberant the environment the harder it is to determine the source location.
Also you might want to consider the number of location dimensions you want to resolve. A point in 3D space is much more difficult than a direction relative to the sensors
Using ML as an approach to this is not entirely without merit but you will have to consider what it is you would be learning, i.e. you probably don't want to learn the test rooms reverberant properties but instead the sensors spatial properties.

Robot odometry in labview

I am currently working on a (school-)project involving a robot having to navigate a corn field.
We need to make the complete software in NI Labview.
Because of the tasks the robot has to be able to perform the robot has to know it's position.
As sensors we have a 6-DOF IMU, some unrealiable wheel encoders and a 2D laser scanner (SICK TIM351).
Until now I am unable to figure out any algorithms or tutorials, and thus really stuck on this problem.
I am wondering if anyone ever attempted in making SLAM work in labview, and if so are there any examples or explanations to do this?
Or is there perhaps a toolkit for LabVIEW that contains this function/algorithm?
Kind regards,
Jesse Bax
3rd year mechatronic student
As Slavo mentioned, there's the LabVIEW Robotics module that contains algorithms like A* for pathfinding. But there's not very much there that can help you solve the SLAM problem, that I am aware of. The SLAM problem consist of the following parts: Landmark extraction, data association, state estimation and updating of state.
For landmark extraction, you have to pick one or multiple features that you want the robot to recognize. This can for example be a corner or a line(wall in 3D). You can for example use clustering, split and merge or the RANSAC algorithm. I believe your laser scanner extract and store the points in a list sorted by angle, this makes the Split and Merge algorithm very feasible. Although RANSAC is the most accurate of them, but also has a higher complexity. I recommend starting with some optimal data points for testing the line extraction. You can for example put your laser scanner in a small room with straight walls and perform one scan and save it to an array or a file. Make sure the contour is a bit more complex than just four walls. And remove noise either before or after measurement.
I haven't read up on good methods for data association, but you could for example just consider a landmark new if it is a certain distance away from any existing landmarks or update an old landmark if not.
State estimation and updating of state can be achieved with the complementary filter or the Extended Kalman Filter (EKF). EKF is the de facto for nonlinear state estimation [1] and tend to work very well in practice. The theory behind EKF is quite though, but it should be a tad easier to implement. I would recommend using the MathScript module if you are going to program EKF. The point of these two filters are to estimate the position of the robot from the wheel encoders and landmarks extracted from the laser scanner.
As the SLAM problem is a big task, I would recommend program it in multiple smaller SubVI's. So that you can properly test your parts without too much added complexity.
There's also a lot of good papers on SLAM.
http://www.cs.berkeley.edu/~pabbeel/cs287-fa09/readings/Durrant-Whyte_Bailey_SLAM-tutorial-I.pdf
http://ocw.mit.edu/courses/aeronautics-and-astronautics/16-412j-cognitive-robotics-spring-2005/projects/1aslam_blas_repo.pdf
The book "Probabalistic Robotics".
https://wiki.csem.flinders.edu.au/pub/CSEMThesisProjects/ProjectSmit0949/Thesis.pdf
LabVIEW provides LabVIEW Robotics module. There are also plenty of templates for robotics module. Firstly you can check the Starter Kit 2.0 template Which will provide you simple working self driving robot project. You can base on such template and develop your own application from working model, not from scratch.

Logitech Facial Feature tracking

for my application i want to track the facial features. i have tried some methods but none of them provided the required robustness .
the first method is based on haar face detection,canny edge detection, contour finding and key points detection , in this approach the landmarks changes drastically.
second i have used flandmark [http://cmp.felk.cvut.cz/~uricamic/flandmark/], in this approach the obtained landmark points are not enough(flandmark will detect 7 points).
i have seen the Logitech avatars their facial feature tracking was accurate and robust.any ideas how they are doing ?. it will be helpful....
Check out this example in matlab. It uses the Viola and Jones algorithm to detect a face, and Kanade-Lucas-Tomasi (KLT) point tracking algorithm to track it.

how can I detect floor movements such as push-ups and sit-ups with Kinect?

I have tried to implement this using skeleton tracking provided by Kinect. But it doesn't work when I am lying down on a floor.
According to Blitz Games CTO Andrew Oliver, there are specific ways to implement with depth stream or tracking silhouette of a user instead of using skeleton frames from Kinect API. You may find a good example in video game Your Shape Fitness. Here is a link showing floor movements such as push-ups!
Do you guys have any idea how to implement this or detect movements and compare them with other movements using depth stream?
What if a 360 degree sensor was developed, one that recognises movements not only directly in front, to the left, or right of it, but also optimizes for movement above(?)/below it? The image that I just imagined was the spherical, 360 degree motion sensors often used in secure buildings and such.
Without another sensor I think you'll need to track the depth data yourself. Here's a paper with some details about how MS implements skeletal tracking in the Kinect SDK, that might get you started. They implement object matching while parsing the depth data to capture joints in the body, you may need to implement some object templates and algorithms to do the matching yourself. Unless you can reuse some of the skeletal tracking libraries to parse out objects from the depth data for you.