Hip points too low - mediapipe

The mediapipe library has been working well for our project so far, however, the only problem we have is the hip joint points are too low and we need them higher. There isn't a way of getting them higher right? I'm assuming that's just the way it is.
For a better explanation, please read here:
https://github.com/swisstackle/fp/issues/1
If there isn't a way of fixing this with mediapipe, do you guys know any other libraries where the hip points are higher up?

Related

Information about CGAL and alternatives

I'm working on a problem that will eventually run in an embedded microcontroller (ESP8266). I need to perform some fairly simple operations on linear equations. I don't need much, but do need to be able work with points and linear equations to:
Define an equations for lines either from two known points, or one
point and a gradient
Calculate a new x,y point on an equation line that is a specific distance from another point on that equation line
Drop a perpendicular onto an equation line from a point
Perform variations of cosine-rule calculations on points and triangle sides defined as equations
I've roughed up some code for this a while ago based on high school "y = mx + c" concepts, but it's flawed (it fails with infinities when lines are vertical), and currently in Scala. Since I suspect I'm reinventing a wheel that's not my primary goal, I'd like to use someone else's work for this!
I've come across CGAL, and it seems very likely it's capable of all this and more, but I have two questions about it (given that it seems to take ages to get enough understanding of this kind of huge library to actually be able to answer simple questions!)
It seems to assert some kind of mathematical perfection in it's calculations, but that's not important to me, and my system will be severely memory constrained. Does it use/offer memory efficient approximations?
Is it possible (and hopefully easy) to separate out just a limited subset of features, or am I going to find the entire library (or even a very large subset) heading into my memory limited machine?
And, I suppose the inevitable follow up: are there more suitable libraries I'm unaware of?
TIA!
The problems that you are mentioning sound fairly simple indeed, so I'm wondering if you really need any library at all. Maybe if you post your original code we could help you fix it--your problem sounds like you need to redo a calculation avoiding a division by zero.
As for your point (2) about separating a limited number of features from CGAL, giving the size and the coding style of that project, from my experience that will be significantly more complicated (if at all possible) than fixing your own code.
In case you want to try a simpler library than CGAL, maybe you could try Boost.Geometry
Regards,

Tensorflow: how to detect audio direction

I have a task: to determine the sound source location.
I had some experience working with tensorflow, creating predictions on some simple features and datasets. I assume that for this task, there would be necessary to analyze the sound frequences and probably other related data on training and then prediction steps. The sound goes from the headset, so human ear is able to detect the direction.
1) Did somebody already perform that? (unfortunately couldn't find any similar project)
2) What kind of caveats could I meet while trying to achieve that?
3) Am I able to do that using this technology approach? Are there any other sound processing frameworks / technologies / open source projects that could help me ?
I am asking that here, since my research on google, github, stackoverflow didn't show me any relevant results on that specific topic, so any help is highly appreciated!
This is typically done with more traditional DSP with multiple sensors. You might want to look into time difference of arrival(TDOA) and direction of arrival(DOA). Algorithms such as GCC-PHAT and MUSIC will be helpful.
Issues that you might encounter are: DOA accuracy is function of the direct to reverberant ratio of the source, i.e. the more reverberant the environment the harder it is to determine the source location.
Also you might want to consider the number of location dimensions you want to resolve. A point in 3D space is much more difficult than a direction relative to the sensors
Using ML as an approach to this is not entirely without merit but you will have to consider what it is you would be learning, i.e. you probably don't want to learn the test rooms reverberant properties but instead the sensors spatial properties.

Advise for camera buying

I am intending to buy a camera for my research about robotics. I am relatively new if I am ashamed to say that I am absolutely new to CV.
My job is detecting an objects and return the [x,y,z] coordinate. My platform is Ubuntu 12.04 and I intend to use python to program.
I read about some device on the internet like Kinect x360. But I have no ideas to choose the best one[price and suits my job (return x,y,z with precision < 5mm after caliberating, no entertainment needed)].
Please advice for the right one with suitable price or the best price.
Thanks so much.
With that few information about the problem I can just give you general advice.
The Kinect camera you mentioned not only captures images but can also give a rough estimation of the distance of every pixel from the camera. This can be very useful for determining the position of an object but I think you cannot obtain a 5mm precision with it.
I think your best bet is to go with two cameras and do stereo vision to obtain the distance. You can use normal cameras to do that but you'll need a high resolution to obtain the precision you want.
I hope it helps.

Is the depth image returned by Microsoft Kinect SDK already undistorted?

Supposedly the Microsoft SDK has access to the Kinect's intrinsic parameters but does anyone have any idea if the depth image it returns is actually undistorted? I couldn't find anything relevant.
Let me know if I'm out of topic although I consider this as an implicit programming question :)
edit: some other useful links I found that support #Coeffect's answer
http://ros.org/wiki/kinect_calibration/technical
Accuracy Analysis of Kinect Depth Data, by K.Khoshelham
So the IR pattern that the Kinect displays isn't your normal grid. For an example, check out this blog post. The Kinect handles making a normal depth map out of this. Thinking about focal lengths and such for this system is just going to dig yourself a hole. Your thoughts about precision are probably misplaced. The Kinect isn't accurate enough to BE picky about such things. Having used the Kinect for motion detection, there's a lot of noise. If you have a certain situation in mind, you might want to post about it.
edit: Here's a post showing that the depth isn't linear, and more of the precision is focused on closer objects. So the farther away you are, the less precise the data is, and the more severe the noise will become (because having the returned depth change by 1 nearby is pretty much nothing, but farther away that accounts for a larger distance change).

Kinect joint detection from top

I'm wondering, does the Kinect detects joints correctly when it's put on the top (on the ceiling).
I don't have necessary equipment to attach it to ceiling and test, but was wondering whether it reliably detects human. I'm ok even if it confuses the joints, actually.
Has anybody tested this?
From what I've seen while using it, the skeleton detection is iffy from any angle other than directly pointing at a person's front or back. A Kinect pointed straight down with people walking under it would almost certainly not detect anyone, because the human form from above does not look anything like it does from the front. I have had the Kinect pick up random people around me in odd positions (sitting, viewed from the side, etc), but the joints were largely spastic. If you have it mounted on the ceiling and pointed downwards at a sufficient angle to still see people from the front instead of from above.. it could do a fairly good job of picking them up.
So when you say on the ceiling do you mean pointing straight down or still looking at a fairly horizontal angle?
I did a little bit of testing with the Kinect mounted in a very high position (2.5 m, 70° to the ground). As answered by Coeffect it just doesn't work. It doesn't work with Microsoft SDK nor with OpenNI. What I can add is that the skeleton recognition only works if the user is facing the camera with her/his whole body-front. Even worse, both frameworks seem to expect the head at the top of the depth-frame.