How to programmatically test for audio sync - testing

I have a multimedia application that among other things converts video using FFMpeg. Video conversion being the pain that it is, I have in my test suits some tests that check our ability to convert various video formats, with emphasis on sample videos known not to work.
A common problem we've noticed from users is that some videos end up with their audio desynched after being processed, and I am looking for a way to check this in my tests.
Extracting the audio portion of the resulting videos is not a problem.
My best idea so far would be to check the offset of the first non-silence at both the beginning and end and compare each between the two videos, but I'm hoping someone smart has a better idea.
The application language/environment is Java, but since this is for testing, I'm free to use any toolset.

The basic problem is likely that the video and audio are different lengths. Extract the audio and test its length vs. the video length. If they are significantly different (more than maybe .05 sec, I'm not really sure what is detectable as "off"), then there's a problem.
To fix it, re-encode the audio to match the video length, and then put the audio and video back into a container format.

Related

How to playback multiple audio files synchronously in Expo-av?

In my app users record themselves singing over a backing track, and then later playback the recorded audio and this backing track at the same time. I use expo-av for my audio system. The problem is that at the playback stage the audio is often out of sync because expo only really supports asynchronous audio. Does anyone have any advice on how to approach this problem at a high level?
A few of my ideas:
Mix the two audio files into a single file for playback. This almost works except for the fact that the recording and backing track are also out of sync. If I knew exactly how much they were offset, I could just add that amount of silence to one of the files when mixing. However, I haven't found a way to accurately calculate this offset.
Reduce time it takes for recording and playback to start, so that the latency is not noticeable. Some things I've found that help here are recording at lower quality and using smaller audio files. Any other tips here would be appreciated.
Use a different audio library than expo-av. Is there one that comes to mind that better supports synchronous audio? Ideally it would also be supported by Expo or at least React Native.

Frame by frame decode using Media Source Extension

I've been digging through the Media Source Extension examples on the internet and haven't quite figured out a way to adapt them to my needs.
I'm looking to take a locally cached MP4/WebM video (w/ 100% keyframes and 1:1 ratio of clusters/atoms to keyframes) and decode/display them non-sequentially (ie. frame 10, 400, 2, 100, etc.) and to be able to render these non-sequential frames on demand at rates from 0-60fps. The simple non-MSE approach using the currentTime property fails due to the latency in setting this property and getting a frame displayed.
I realize this is totally outside normal expectations for video playback, but my application requires this type of non-sequential high speed playback. Ideally I can do this with h264 for GPU acceleration but I realize there could be some platform specific GPU buffers to contend with, though it seems that a zero frame buffer should be possible (see here). I am hoping that MSE can accomplish this non-sequential high framerate low latency playback, but I know I'm asking for a lot.
Questions:
Will appendBuffer accept a single WebM cluster / MP4 Atom made up of a single keyframe, and also be able to decode at a high frequency (60fps)?
Do you think what I'm trying to do is possible in the browser?
Any help, insight, or code suggestions/examples would be much appreciated.
Thanks!
Update 4/5/16
I was able to get MSE mostly working with single frame MP4 fragments in Firefox, Edge, and Chrome. However, Chrome seems to be running into the frame buffer issue linked above and I haven't found a way to pre-process a MP4 to invoke this "low delay" mode. Anyone have any clues if it's possible to create such a file with an existing tool like MP4Box?
Firefox and Edge decode/display the individual frames immediately with very little latency, but of course something breaks once I load this video into a Three.js WebGL project (no video output, no errors). I'm ignoring this for now as I'd much rather have things working on Chrome as I'll be targeting Android as well.
I was able to get this working pretty well. The key was getting Chrome to enter its "low delay" mode by muxing a specially crafted MP4 file using modified mp4box sources. I added one line in movie_fragments.c so it read:
if (movie->moov->mvex->mehd && movie->moov->mvex->mehd->fragment_duration) {
trex->track->Header->duration = 0;
Media_SetDuration(trex->track);
movie->moov->mvex->mehd->fragment_duration = 0;
}
Now every MP4 created will have the MEHD fragment duration set to 0 which causes Chrome to process it as a live stream.
I still have one remaining issue related to the timestampOffset property which in combination with the FPS set in the media fragments control the playback speed. Since I'm looking to control the FPS directly I don't want any added delay from the MSE playback engine. I'll post a separate question here to address that.
Thanks,
Dustin

Record audio in OS X into FLAC using Cocoa

I am trying to record audio from a microphone/iSight camera from Mac to a NSData object.
I have tried to do it using QTKit, but I found out that you could only save it as a .mov file.
But the fact is that I want to recode the audio into a FLAC file. Is that posible, or I'll need to use another framework?.
Thanks.
Grab the source for VLC (if you can deal w/GPL -- it has limitations on use that many find onerous) and have a read. It does transcoding, amongst other things.
Beyond that, one dead simple approach is to save as AIFF and then use a command line tool (via NSTask) to do the conversion.
Or you could just go with Apple Lossless -- it is open source these days.
Of course, this also begs the question; why do you need lossless compression when recording voice [low bandwidth in the first place] via a relatively sub-par microphone?

QTKit: Analog for VideoContext for the sound

I am writing a simple application for streaming video over the network, using a slightly different from the ordinary "H.264 over RTP" approach (i am using my own codecs).
To achieve this, i need raw frames and raw audio samples that QTMovie, when playing back a movie, implicitly sends to QTMovieView.
The most common way to retrieve raw video frames is to use VisualContext - and then, using a display link callback, i "generate" a CVPixelBufferRef, using this VisualContext. So i am getting frames with some frequency that is synchronized with my current refresh rate (not that i need this synchronization - i only need to have a "stream" of frames that i can transmit over the network - but CoreVideo Programming Guide and most Apple samples related to video promote this approach).
The first problem i have faced with - is when i attach a VisualContext to a QTMovie, the picture can't be rendered onto the QTMovieView anymore. I don't know why does this happen (i guess it's related to the idea of GWorld and the rendering being "detached" from it when i attach VisualContext). Ok, at least i have frames, which i could render onto a simple NSView (though this sounds wrong, and performance-unfriendly. Am i doing it right?)
What about the sound, i have no idea what to do. I need to get raw samples of sound as the movie being played (ideally - something similar to what QTCaptureDecompressedAudioOutput returns in its callback).
I have prepared myself to delving into deprecated Carbon QuickTime APIs, if there is no other way. But I don't know even where to start. Should i use the same CoreVideo Display link and periodically retrieve sound somehow? Should i get QTDataReference and locate the sound frames manually?
I am actually a beginner with programming video and audio services. If you could share some experience i would REALLY appreciate any idea you could share with me :)
Thank you,
James

Mac OS X equivalent for DirectShow, GraphEdit

New to Mac OS X, familiar with Windows. Windows has DirectShow, a good number of built-in filters, COM programming, and GraphEdit for very fast prototyping and snooping on the graphs you've constructed in code.
I'm now about to go to the Mac to work with cameras, webcams, microphones, color spaces, files, splitting, synchronization, rendering, file reading, file saving, and many of things I've come to take for granted with DirecShow when putting together applications for live performance. On the Mac side, so far I've found ... nothing! Either I don't know where to look or I'm having the toughest time tying the Mac's reputation for its ease of handling media with a coherent programmatic ability to get in there and start messin' with media manipulatin' building blocks.
I've seen some weak suggestions to use gstreamer or some library for QT but I can't bring myself to believe that this is the Apple way to go. And I've come across some QuickTime documentation but I'm not looking to do transitions, sprites, broadcasting, ...
Having a brain trained on DirectShow means I don't even know how Apple thinks about providing DirectShow-like functionality. That means I don't know the right keywords and don't even know where to look. Books? Bought a few. Now I might be able to write some code that can edit your sister's wedding video (if I can't make decent headway on this topic I may next be asking what that'd be worth to you), but for identifying what filters are available and how to string them together ... nothing. Suggestions?
Video handling is going through a huge transition on the Mac at the moment. QuickTime is very old, but also big and powerful, so it's been undergoing an incremental replacement process for the past 5 years or so.
That said, QTKit is the QuickTime subset (capture, playback, format conversion and basic video editing) which is supported going forward. The legacy QuickTime APIs are still there for the moment, and probably will remain at least until its major features are available elsewhere, but are 32-bit only. For some involved video stuff you may end up needing to use it in places.
At the moment, iOS is ahead of the Mac because it could start from scratch with AV Foundation. The future of the Mac media frameworks will probably either be AV Foundation directly (with QTKit being a lightweight shim over the top) or an extension of QTKit that looks very similar.
For audio there's Core Audio which is on Mac and iOS and isn't going away any time soon. It's quite powerful but somewhat obtuse in places. Luckily online support is very good; the mailing list is an essential resource.
For filters and frame-level processing you've got Core Video as someone else mentioned, as well as Core Image. For motion graphics there's Quartz Composer which includes a graphical editor and a plugin architecture to add your own patches. For programmatic procedural animation and easily mixing rendering modelsĀ (OpenGL, Quartz, video, etc.) there's Core Animation.
In addition to all of these, of course there's no reason you can't use open source libraries where the built-in stuff doesn't do what you want.
To address your comment below:
In QuickTime (and QTKit), individual data types like audio and video are represented as tracks. It may not be immediately clear that QuickTime can open audio as well as video file formats. A common way to combine audio and video would be:
Create a QTMovie with your video file.
Create a QTMovie with your audio file.
Take the QTTrack object representing the audio and add it to the QTMovie with the video in it.
Flatten the movie, so it doesn't simply contain a reference to the other movie but actually contains the audio data.
Write the movie to disk.
Here's an example from Blender. You'll see how the A/V muxing is done in the end_qt function. There's also some use of Core Audio in there (AudioConverter*). (There's some classic QuickTime export code in quicktime_export.c but it doesn't seem to do audio.)