ReactNative - Listen to specific sound input - Vroom of Car - react-native

What am trying to do is, count the revving("vroom" sound) of a physical car, through my app. Am coding in ReactNative. And I don't plan to create something complex, like communicating with the Car's inbuilt computer or anything to do this.
But instead, I was planning to create the app to listen to the nearby sounds. So if the nearby sound is that of a revving, then the app will simply count it.
I have done other features in my app, but listening to the sound and detect if it's a "vroom" sound is what am stuck with.
Based on my research, I can see that I have to make use of the Fast Fourier Transform algorithm. But am confused at how I can implement it in my ReactNative app. Am still searching for a package that has an implementation.
I have seen some apps that can be used to tune the sounds of Violin, Guitar, etc. What am trying to do is similar to this, but pretty simple. Once I get a basic idea, I will be able to get going. In my case, my app will be listening to the high decibel sound.
Any inputs would be highly appreciated.

This is known as Acoustic Event Detection. Possibly you can use an Audio Classification approach. The best way to solve it is using supervised machine learning. For example a CNN on mel-spectrograms. Here is an introduction. You can do the same in JavaScript using Tensorflow.JS. The official documentation contains a tutorial.
One of the first steps is to collect a small dataset of examples of "vroom" sounds versus other loud non-vroom sounds.

Related

Curious on how to use some basic machine learning in a web application

A co-worker and I had an idea to create a little web game where a user enters a chunk of data about themselves and then the application would write for them to sound like them in certain structures. (Trying to leave the idea a little vague.) We are both new to ML and thought this could be a fun first dive.
We have a decent bit of background with PHP, JavaScript (FE and Node), Ruby a little bit of other languages, and have had interest in learning Python for ML. Curious if you can run a cost efficient ML library for text well with a web app, being most servers lack GPUs?
Perhaps you have to pay for one of the cloud based systems, but wanted to find the best entry point for this idea without racking up too much cost. (So far I have been reading about running Pytorch or TensorFlow, but it sounds like you lose a lot of efficiency running with CPUs.)
Thank you!
(My other thought is doing it via an iOS app and trying Apple's ML setup.)
It sounds like you are looking for something like Tensorflow JS
Yes, before jumping into training something with Deep Learning; (this might even be un-necessary for your purpose) try to build a nice and simple baseline for this.
Before Deep Learning (just a few yrs ago) people did similar tasks using n-gram feature based language models. https://web.stanford.edu/~jurafsky/slp3/3.pdf
Essentially you try to predict the next few words probabilistically given a small context(of n-words; typically n is small like 5 or 6)
This should be a lot of fun to work out and will certainly do quite well with a small amount of data. Also such a model will run blazingly fast; so you don't have to worry about GPUs and compute .
To improve on these results with Deep Learning, you'll need to collect a ton of data first; and it will be work to get it to be fast on a web based platform

Tensorflow: how to detect audio direction

I have a task: to determine the sound source location.
I had some experience working with tensorflow, creating predictions on some simple features and datasets. I assume that for this task, there would be necessary to analyze the sound frequences and probably other related data on training and then prediction steps. The sound goes from the headset, so human ear is able to detect the direction.
1) Did somebody already perform that? (unfortunately couldn't find any similar project)
2) What kind of caveats could I meet while trying to achieve that?
3) Am I able to do that using this technology approach? Are there any other sound processing frameworks / technologies / open source projects that could help me ?
I am asking that here, since my research on google, github, stackoverflow didn't show me any relevant results on that specific topic, so any help is highly appreciated!
This is typically done with more traditional DSP with multiple sensors. You might want to look into time difference of arrival(TDOA) and direction of arrival(DOA). Algorithms such as GCC-PHAT and MUSIC will be helpful.
Issues that you might encounter are: DOA accuracy is function of the direct to reverberant ratio of the source, i.e. the more reverberant the environment the harder it is to determine the source location.
Also you might want to consider the number of location dimensions you want to resolve. A point in 3D space is much more difficult than a direction relative to the sensors
Using ML as an approach to this is not entirely without merit but you will have to consider what it is you would be learning, i.e. you probably don't want to learn the test rooms reverberant properties but instead the sensors spatial properties.

How can i create a 3D modeling app? What resources i will required?

I want to create a application which converts 2d-images/video into a 3d model. While researching on it i found out similar application like Trnio, Scann3D, Qlone,and few others(Though few of them provide poor output 3D model). I also find out about a technology launched by the microsoft research called mobileFusion which showed the same vision i was hoping for my application but these apps were non like that.
Creating a 3D modelling app is complex task, and achieving it to a high standard requires a lot of studying. To point you in the right direction, you most likely want to perform something called Structure-from-Motion(SfM) or Simultaneous Localization and Mapping (SLAM).
If you want to program this yourself OpenCV is a good place to start if you know C++ or Python. A typical pipeline involves; feature extraction and matching, camera pose estimation, triangulation and then optimised using a bundle adjustment. All pipelines for SfM and SLAM follow these general steps (with exceptions of course). All of these steps are possible is OpenCV although Googles Ceres Solver is an excellent open-source bundle adjustment. SfM generally goes onto dense matching which is where you get very dense point clouds which are good for creating meshes. A free open-source pipeline for this is OpenSfM. Another good source for tools is OpenMVG which has all of the tools you need to make a full pipeline.
SLAM is similar to SfM, however, has more of a focus on real-time application and less on absolute accuracy. Applications for this is more centred around robotics where a robot wants to know where it is relative to its environment, but it not so concerned on absolute accuracy. The top SLAM algorithms are ORB-SLAM and LSD-SLAM. Both are open-source and free for you to implement into your own software.
So really it depends what you want... SfM for high accuracy, SLAM for real-time. If you want a good 3D model I would recommend using existing algorithms as they are very good.
The best commercial software in my opinion... Agisoft Photoscan. If you can make anything half as good as this i'd be very impressed. To answer your question what resources will you require. In my opinion, python/c++ skills, the ability to google well and a spare time to read up on photogrammetry and SfM properly.

Shape (preferably human) recognition API for use with standard webcam

I am interested in getting into user interaction/shape detection with a simple usb webcam. I can use multiple webcams, but don't want to be restricted to using something like the kinect sensor. My detection cameras need to be set up on either side of a helmet (or if an individual one, on top). I have found some, but they don't really have the functionality I need and most are angled towards facial recognition. I need to be able to detect a basic human skeletal structure and determine if something is obstructing it. I would really rather be able to do it without using any sort of marker system on the target person. I would like for it to be able to target multiple structures. Obviously I am willing to do tweaking if necessary, but want to see how close I can get to what I need before I rebuild the wheel. I am trying to design an ai system that can determine how many people are in an area and where they are.
Doubt there will be anything like this since Microsoft spent a ton of money on the R&D for Kinect and it's probably all locked behind an NDA. I'm also guessing there's a lot of hardware within the Kinect that is not available in a standard webcam.
The closest thing that I could find to what you're looking for is the OpenKinect project, might be a good place to start your research.

Does anyone have any idea how to create a 2D skeleton with the Kinect depthmap?

I'm currently using a Processing Kinect library which supplies a depth map. I was wondering how I could take that and use it to create a 2D skeleton, if possible. Not looking for any code here, just a general process I could use to achieve those results.
Also, given that we've seen this in several of the Kinect games so far, would it be difficult to have multiple skeletons running at once?
Disclaimer: the reason why you still didn't get an answer for this question is probably because that's a current research problem. So I can't give you a direct answer but will try to help with some information and useful resources for this topic.
There are mainly 2 different approaches to create a skeleton from a depth map. The first one is to use machine learning, the second is purely algorithmic.
For the machine learning one, you'd need many samples of people doing a predetermined move, and use those samples to train your favorite learning algorithm. That's the approach that was taken and implemented by Microsoft in the XBox (source), it works really well BUT you need millions of samples to make it reliable... quite a drawback.
The "algorithmic" approach (understand without using a training set) can be done in many different ways and is a research problem. It's often based on modeling the possible body postures and trying to match that with the depth image received. That's the approach that was chosen by PrimeSense (the guys behind the kinect depth camera technology) for their skeleton tracking tool NITE.
The OpenKinect community maintains a wiki where they list some interesting research material about this topic. You might also be interested in this thread on the OpenNI mailing list.
If you're looking for an implementation of a skeleton tracking tool, PrimeSense released NITE (closed source), the one they made: it's part of the OpenNI framework. That's what's used in most of the videos you might have seen that involve skeleton tracking. I think it's able to handle up to 2 skeletons at the same time, but that requires confirmation.
The best solution is to use FAAST (http://projects.ict.usc.edu/mxr/faast/) which requires OpenNI. I have struggled to get OpenNI to work on my computer. I have not seen an approach yet using Code Laboratories' CL NUI.
An algorithmic approach is http://code.google.com/p/skeletonization/ but you may have a problem because your depthmap only represents surfaces and no closed objects.