Capture and play new media types with AVFoundation

Capture and play new media types with AVFoundation - objective-c

I've got a very non-standard AVFoundation question and as a relative newbie to the iOS world I could really use some guidance from the experts out there -
I'm working on an app that lets the user record bits of audio which I need to programmatically arrange using AVMutableComposition. Here's the thing, in addition to the audio track I want to capture and save accelerometer data and have it synced with the sound. Typically AVFoundation is used for known media types like still photos, audio, and video but I was wondering whether it's feasible to capture something like accelerometer data using this framework. This would make it much easier for me to sync the sensor data with the captured audio, especially when putting the parts together with AVMutableComposition.
Here is what I need to accomplish:
Record accelerometer data as an AVAsset/AVAssetTrack so I can insert it into an AVMutableComposition
Allow for playback of the accelerometer data in a custom view alongside the audio it was recorded with
Save the AVMutableComposition to disk, including both audio and accelerometer tracks. It would be nice to use a standard container like Quicktime
For part 1 & 3 I'm looking at using AVAssetReader, AVAssetReaderOutput, AVAssetWriter, AVAssetWriterInput classes to capture from the accelerometer but without much experience with Cocoa I'm trying to figure out exactly what I need to extend. At this point I'm thinking I need to subclass AVAssetReaderOutput and AVAssetWriterInput and work with CMSampleBuffers to allow the conversion between the raw accelerometer data and an AVAsset. I've observed that most of these classes only have a single private member referencing a concrete implementation (i.e. AVAssetReaderInternal or AVAssetWriterInputInternal). Does anyone know whether this a common pattern or what it means for writing a custom implementation?
I haven't yet given part 2 much thought. I'm currently using an AVPlayer to play the audio but I'm not sure how to have it dispense sample data from the asset to my custom accelerometer view.
Apologies for such an open ended question - I suppose I'm looking more for guidance than a specific solution. Any gut feelings as to whether this is at all possible with AVFoundations architecture?

I would use a NSmutableArray and store the accelerator data plus the time code there. So when you play back you can get the time from the video player and use that look up the accelerator data in the array. Since the data is store as a time line you don't have to search the whole array. It is enough to step forward in the array and check when the time data coincide.

Related

Get access to iPhone camera live feed

I need to get access to the actual camera on the iPhone and take a 'snapshot' of it every .X seconds.
On each update, I'll run image processing on it (a modified version of OpenCVCircles) and if conditions are met, exit out of the camera while taking certain actions. If conditions are not met, then I'll simply stay in the camera. The conditions will be a custom configuration of a series of circles that the user has to look at through the camera.
I know that this could be easy by forcing the user to take a picture, and I grab that from the UIImagePicker. However, I think it would be much better to do it automatically for the user, if they are in view of the image.
Is it possible to do this without completely writing my own Camera by AVCapture classes?

Look into AVCaptureSession, AVCaptureDevice and the like classes. They should provide what you need. We are using it here to provide a live video feed through the network.
Edit
From your question edit, I see this answer does not apply directly to your use case. Yet it is the only mean to my knowledge that will allow you to accomplish what you seek.

MPEG-2 seeking, where to start?

I'd like to be able to seek to an arbitrary frame in a MPEG-2 file (from DVD, I guess it's called MPEG-2 Program Stream). So far I had been using OpenCV 2.1 for accessing those frames but that would only work on a frame after frame basis (only forward seeking). Later when I installed OpenCV 2.3.1 that possibility was lost though, i.e. limited to AVI. Anyways, I'd like to do that without OpenCV. I've managed to seek to keyframes (I suppose) or every so and so frame (e.g. every 12th frame). Now, looking at VirtualDub frame accurate seeking is possible. It says: ''parsing interleaved MPEG-2 file''. What exactly does that mean and where would I have to start to do the same? Is it even legal, I remember reading something about that somewhere, can't really remember though. I'm programming in C++ using directshow. As far as I know directshow won't do it. Then I was looking into CBaseFilter, streamtime method etc but before I dive into that complex topic I'd like to know if that's the right way to go. Looking forward to your answers, thanks!
# Geraint: code snippet of filter graph:
CoCreateInstance(CLSID_FilterGraph,NULL,CLSCTX_INPROC,IID_IGraphBuilder,(LPVOID *)&pGraphBuilder);
CoCreateInstance(CLSID_MPEG2Demultiplexer,NULL,CLSCTX_INPROC,IID_IBaseFilter,(LPVOID *)&pib);
CoCreateInstance(CLSID_CMPEG2VidDecoderDS,NULL,CLSCTX_INPROC,IID_IBaseFilter,(LPVOID *)&pib2);
pGraphBuilder->AddFilter(pib,L"Sample Splitter");
pGraphBuilder->AddFilter(pib2,L"Sample Decoder");
ZeroMemory(&am_media_type, sizeof(am_media_type));
am_media_type.majortype = MEDIATYPE_Video;
am_media_type.subtype = MEDIASUBTYPE_MPEG2_VIDEO;
am_media_type.formattype = FORMAT_MPEG2Video;
pGraphBuilder->QueryInterface(IID_IMediaControl,(LPVOID *)&pMediaControl);
pGraphBuilder->QueryInterface(IID_IMediaSeeking, (void**)(&pMediaSeeking));
pGraphBuilder->QueryInterface(__uuidof(IVideoFrameStep), (PVOID *)&fst);
pGraphBuilder->QueryInterface(IID_IMediaEvent, (void **)&imev);
pGraphBuilder->QueryInterface(IID_IBasicVideo,(LPVOID *)&ibv);
pGraphBuilder->RenderFile(FILENAME,0);
and then I use IMediaSeeking for seeking the vid. I've also tried frame stepping (hence the references above).

DirectShow is capable of delivering frame-accurate seeking. However, without an index, this is based on a time offset from file start, not a frame count.
Use IMediaSeeking to set the start time. The demux will begin delivery of compressed frames some way before that. The decoder will start decoding at the previous key frame but will discard any frames that are before your chosen start point.
G

NAudio - Create software beat machine / sampler - General Strategy

Im new to NAudio but the goal of my project is to provide the user with the ability for the user to listen to an MP3 and then select a sample or a "chunk" of that song as a sample which could be saved to disk. These samples would be able to replayed at the same time (i.e. not merged but played at the same time).
Could someone please let me know what the overall strategy required to achieve this (....not necessarily the specifics...almost like pseduo code....).
For example would the samples / chunks of a song need to be saved as a WAV file. And these samples could be played together in the WAV format, etc.
I have seen a few small examples of a few implementations of some of the ideas Ive mentioned above but dont have a good sense of the big picture just yet.
Thanks in advance,
Andrew

The chunks wouldn't need to be saved as WAV files unless you were keeping them for future use. You can store the PCM audio (Mp3FileReader automatically converts to PCM) in a byte array and use RawSourceWaveStream to play them.
As for mixing them, I'd recommend using the MixingSampleProvider. This does mean you need to convert your RawSourceWaveStream to IEEE float, but you can use Pcm16BitToSampleProvider to do this. This will give the advantage that you can adjust volumes (and do other DSP) easily on the samples you are mixing. MixingSampleProvider also auto-removes completed inputs, so you can just add new inputs whenever you want to trigger a sound.

how can I detect floor movements such as push-ups and sit-ups with Kinect?

I have tried to implement this using skeleton tracking provided by Kinect. But it doesn't work when I am lying down on a floor.
According to Blitz Games CTO Andrew Oliver, there are specific ways to implement with depth stream or tracking silhouette of a user instead of using skeleton frames from Kinect API. You may find a good example in video game Your Shape Fitness. Here is a link showing floor movements such as push-ups!
Do you guys have any idea how to implement this or detect movements and compare them with other movements using depth stream?

What if a 360 degree sensor was developed, one that recognises movements not only directly in front, to the left, or right of it, but also optimizes for movement above(?)/below it? The image that I just imagined was the spherical, 360 degree motion sensors often used in secure buildings and such.

Without another sensor I think you'll need to track the depth data yourself. Here's a paper with some details about how MS implements skeletal tracking in the Kinect SDK, that might get you started. They implement object matching while parsing the depth data to capture joints in the body, you may need to implement some object templates and algorithms to do the matching yourself. Unless you can reuse some of the skeletal tracking libraries to parse out objects from the depth data for you.

Creating simple waveforms with CoreAudio

I am new to CoreAudio, and I would like to output a simple sine wave and square wave with a given frequency and amplitude through the speakers using CA. I don't want to use sound files as I want to synthesize the sound.
What do I need to do this? And can you give me an example or tutorial? Thanks.

There are a number of errors in the previous answer. I, the legendary :-) James McCartney, not James Harkins wrote the sinewavedemo, I also wrote SuperCollider which is what the audiosynth.com website is about. I also now work at Apple on CoreAudio. The sinewavedemo DOES use CoreAudio, since it uses AudioHardware.h from CoreAudio.framework as its way to play the sound.
You should not use the sinewavedemo. It is very old code and it makes dangerous assumptions about the buffer layout of the audio hardware. The easiest way nowadays to play a sound that you are generating is to use the AudioQueue, or to use an output audio unit with a render callback set.

The best and easiest way to do that without files is to prepare a single cycle buffer, containing one cycle of the wave (this is called technically a wavetable)
In the playback function called by CoreAudio thread, fill the output buffer with samples read from the wave buffer.
Note however that you will face two problems very quickly :
- for the sine wave, if the playback frequency is not an integer multiple of the desired sine frequency, you will probably need to implement an interpolator if you want to have a good quality. Using only integer pointers will generate a significant level of harmonic noise.
for the square wave, avoid to just program an array with +1 / -1 values. Such a signal is not bandlimited and will alias a lot. Do not forget that the spectrum of a square wave is virtually infinite!
To get good algorithms for signal generation, take a look to musicdsp.org, that's probably one of the best resource for that

Are you new to audio programming in general? As a starting point i would check out
http://www.audiosynth.com/sinewavedemo.html
This is a minimum osx sinewave implementation by the legendary James Harkins. Note, it doesn't use CoreAudio at all.
If you specifically want to use CoreAudio for your sinewave you need to create an output unit (RemoteIO on the iphone, AUHAL on osx) and supply an input callback, where you can pretty much use the code from the above example. Check out
http://developer.apple.com/mac/library/technotes/tn2002/tn2091.html
The benefits of CoreAudio are chiefly, chain other effects with your sinewave, write plugins for hosts like Logic & provide the interfaces for them, write a host (like Logic) for plugins that can be chained together.
If you don't wont to write a plugin, or host plugins then CoreAudio might not actually be for you. But one of the best things about using CoreAudio is that once you get your sinewave callback working it is easy to add effects, or mix multiple sines together
To do this you need to put your output unit in a graph, to which you can effects, mixers, etc.
Here is some help on setting up graphs http://timbolstad.com/2010/03/16/core-audio-getting-started-pt2/
It isn't as difficult as it looks. Apple provides C++ helper classes for many things (/Developer/Examples/CoreAudio/PublicUtility) and even if you don't want to use C++ (you don't have to!) they can be a useful guide to the CoreAudio API.

If you are not doing this realtime, using the sin() function from math.h is not a bad idea. Just fill however many samples you need with sin() beforehand when it is time to play it, just send it to the audio buffer. sin() can be quite slow to call once every sample if you are doing this realtime, using an interpolated wavetable lookup method is much faster, but the resulting sound will not be as spectrally pure.

There is a good and well documented sine wave player code example in Chapter 7 of the Adamson/Avila "Learning Core Audio" book, published by Addison-Wesley Professional (ISBN-10: 0-321-63684-8 ):
http://www.informit.com/store/learning-core-audio-a-hands-on-guide-to-audio-programming-9780321636843
It is a rather new publication (2012) and addresses precisely the issue of this question. It's only a starting point, but it's a valuable starting point.
BTW. Don't jump to graphs before having this basic lesson (which involves some math) behind.
Concerning example code, a quick and efficient method I often use deals with a pre-filled sinewave lookup table which has as many members as sample rate, for 44100 Hz the table has size of 44100. In other words, cycle length equals sample rate. This gives an acceptable trade-off between speed and quality in many cases. You can initialize it with the program.
If you generate floating point samples (which is default in OSX), and use math functions, use sinf() rather than (float)sin(). Promotions in inner loop cycles of a render callback are always resource-expensive. So are repetitive multiplications of constants, such as 2.0*M_PI, which can too often be found in code examples.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas