I'm using AudioToolbox to access m4a audio files with following code:
UInt32 packetsToRead = 1; //Does it makes difference?
void *buffer = malloc(maxPacketSize * packetsToRead);
for (UInt64 packetIndex = 0; packetIndex < packetCount; packetIndex++)
{
ioNumberOfPackets = packetsToRead;
ioNumberOfBytes = maxPacketSize * ioNumberOfPackets;
AudioFileReadPacketData(audioFile, NO, &ioNumbersOfBytes, NULL, packetIndex, &ioNumberOFPackets, buffer);
for (UInt32 batchPacketIndex = 0; batchPacketIndex < ioNumberOfPackets; batchPacketIndex++)
{
//What to do here to get amplitude value? How to get sample value?
}
packetIndex+=ioNumberOfPackets;
}
My audio format is:
AppleM4A, 8000 Hz, 16 Bit, 4096 frames per packet
The solution was to use extended audio file services. You just have to set up transition between client format and PCM. Got the right way overthere Audio Processing: Playing with volume level.
To get waveform data, you may first need to convert your compressed audio file into raw PCM samples, such as found inside a WAV file, or other non-compressed audio format. Try AVAssetReader, et.al.
Related
I am trying to create an MP4 file using libavcodec. I am using a raspberry pi which has a built in hardware H264 encoder. It outputs Annex B H264 frames and I am trying to see the proper way to save these frames into an MP4 container.
My first attempt simply wrote the MP4 header without building the extradata. The raspberry pi transmits as first frame the SPS and PPS info. This is followed by IDR and then the remaining H264 frames. I started with avformat_write_header and then repackaged the succeeding frames in AVPacket and used
av_write_frame(outputFormatCtx, &pkt);
This works fine but mplayer tries to decode the first frame ( the one containing SPS and PPS info ) and fails with decoding that frame. However, succeeding frames are decodable and the video plays fine from that point on.
I wanted to construct a proper MP4 file so I wanted the SPS and PPS information to go the MP4 header. I read that it should be in the avc1 atom and that I needed to build the extradata and somehow link it to the outputformatctx.
This is my effort so far, after parsing sps and pps from the returned encoder buffers. (I removed the leading 0x0000001 nal delimiters prior to memcpying to sps and pps).
if ((sps) && (pps)) {
//length of extradata is 6 bytes + 2 bytes for spslen + sps + 1 byte number of pps + 2 bytes for ppslen + pps
uint32_t extradata_len = 8 + spslen + 1 + 2 + ppslen;
outputStream->codecpar->extradata = (uint8_t*)av_mallocz(extradata_len);
outputStream->codecpar->extradata_size = extradata_len;
//start writing avcc extradata
outputStream->codecpar->extradata[0] = 0x01; //version
outputStream->codecpar->extradata[1] = sps[1]; //profile
outputStream->codecpar->extradata[2] = sps[2]; //comatibility
outputStream->codecpar->extradata[3] = sps[3]; //level
outputStream->codecpar->extradata[4] = 0xFC | 3; // reserved (6 bits), NALU length size - 1 (2 bits) which is 3
outputStream->codecpar->extradata[5] = 0xE0 | 1; // reserved (3 bits), num of SPS (5 bits) which is 1 sps
//write sps length
memcpy(&outputStream->codecpar->extradata[6],&spslen,2);
//Check to see if written correctly
uint16_t *cspslen=(uint16_t *)&outputStream->codecpar->extradata[6];
fprintf(stderr,"SPS length Wrote %d and read %d \n",spslen,*cspslen);
//Write the actual sps
int i = 0;
for (i=0; i<spslen; i++) {
outputStream->codecpar->extradata[8 + i] = sps[i];
}
for (size_t i = 0; i != outputStream->codecpar->extradata_size; ++i)
fprintf(stderr, "\\%02x", (unsigned char)outputStream->codecpar->extradata[i]);
fprintf(stderr,"\n");
//Number of pps
outputStream->codecpar->extradata[8 + spslen] = 0x01;
//Size of pps
memcpy(&outputStream->codecpar->extradata[8+spslen+1],&ppslen,2);
for (size_t i = 0; i != outputStream->codecpar->extradata_size; ++i)
fprintf(stderr, "\\%02x", (unsigned char)outputStream->codecpar->extradata[i]);
fprintf(stderr,"\n");
//Check to see if written correctly
uint16_t *cppslen=(uint16_t *)&outputStream->codecpar->extradata[+8+spslen+1];
fprintf(stderr,"PPS length Wrote %d and read %d \n",ppslen,*cppslen);
//Write actual PPS
for (i=0; i<ppslen; i++) {
outputStream->codecpar->extradata[8 + spslen + 1 + 2 + i] = pps[i];
}
//Output the extradata to check
for (size_t i = 0; i != outputStream->codecpar->extradata_size; ++i)
fprintf(stderr, "\\%02x", (unsigned char)outputStream->codecpar->extradata[i]);
fprintf(stderr,"\n");
//Access the outputFormatCtx internal AVCodecContext and copy the codecpar to it
AVCodecContext *avctx= outputFormatCtx->streams[0]->codec;
fprintf(stderr,"Extradata size output stream sps pps %d\n",outputStream->codecpar->extradata_size);
if(avcodec_parameters_to_context(avctx, outputStream->codecpar) < 0 ){
fprintf(stderr,"Error avcodec_parameters_to_context");
}
//Check to see if extradata was actually transferred to OutputformatCtx internal AVCodecContext
fprintf(stderr,"Extradata size after sps pps %d\n",avctx->extradata_size);
//Write the MP4 header
if(avformat_write_header(outputFormatCtx , NULL) < 0){
fprintf(stderr,"Error avformat_write_header");
ret = 1;
} else {
extradata_written=true;
fprintf(stderr,"EXTRADATA written\n");
}
}
The resulting video file does not play. The extradata is actually stored in the tail section of the MP4 file instead of the location in the MP4 header for avc1. So it is being written by libavcodec but written likely by avformat_write_trailer.
I will post the PPS and SPS info here and the final extradata byte string just in case the error was in forming the extradata.
Here is the buffer from the hardware encoder with sps and pps preceded by the nal delimiter
\00\00\00\01\27\64\00\28\ac\2b\40\a0\cd\00\f1\22\6a\00\00\00\01\28\ee\04\f2\c0
Here is the 13 byte sps:
27640028ac2b40a0cd00f1226a
Here is the 5 byte pps:
28ee04f2c0
Here is the final extradata byte string which is 29 bytes long. I hope I wrote the PPS and SPS size correctly.
\01\64\00\28\ff\e1\0d\00\27\64\00\28\ac\2b\40\a0\cd\00\f1\22\6a\01\05\00\28\ee\04\f2\c0
I did the same conversion from NAL delimiter 0x0000001 to 4 byte NAL size for the succeeding frames from the encoder and saved them to the file sequentially and then wrote the trailer.
Any idea where the mistake is? How can I write the extradata to its proper location in the MP4 header?
Thanks,
Chris
Well, I found the problem. The raspberry pi is little endian so I assumed that I must write the sps length and pps length and each NALU size in little endian. They need to be written in big endian. After I made the change, the avcc atom showed in mp4info and mplayer can now playback the video.
It's not necessary to access the outputformatctx internal avcodeccontext and modify it.
This post was very helpful:
Possible Locations for Sequence/Picture Parameter Set(s) for H.264 Stream
Thanks,
Chris
I saw one sample in encoding wav file,here is the sample
sample for encoding
in this part of code have doubt:
/* encode a single tone sound */
float t, tincr;
t = 0;
tincr = 2 * M_PI * 440.0 / c->sample_rate;
for(i=0; i<2000; i++) {
for(j=0;j<frame_size;j++) {
samples[2*j] = (int)(sin(t) * 10000);
samples[2*j+1] = samples[2*j];
t += tincr;
}
/* encode the samples */
what is 2000 here,in basis of what we have to give this value,because of this i thing my encoding is not correct,any suggestion will be helpfull
It seems to be an arbitrary number of repeated 'frames' that make up the sample. In a different code path it constructs another type of wave form in a similar way and mentions 2000=>52sec.
I can get WdlResamplingSampleProvider to work for the 16 bit example provided by Mark Heath on his blog;
int outRate = 16000;
var inFile = #"test.mp3";
var outFile = #"test resampled WDL.wav";
using (var reader = new AudioFileReader(inFile))
{
var resampler = new WdlResamplingSampleProvider(reader, outRate);
WaveFileWriter.CreateWaveFile16(outFile, resampler);
}
except I'm reading a wav file instead of an mp3 file. But I really need to work with 32 bit wav files (input and output) without losing bit depth. Is there a way to do this?
WdlResamplingSampleProvider works with 32 bit floating point (IEEE) samples. So AudioFileReader has already converted to 32 bit float if the input wasn't like that. So it's completely up to you what you do with the output. If you just call CreateWaveFile you'll get a 32 bit floating point WAV file.
If I open an audio file with extended audio file services, using the following client data format...
AudioStreamBasicDescription audioFormat;
memset(&audioFormat, 0, sizeof(audioFormat));
audioFormat.mSampleRate = 44100.0;
audioFormat.mFormatID = kAudioFormatLinearPCM;
audioFormat.mFormatFlags = kAudioFormatFlagIsBigEndian |
kAudioFormatFlagIsSignedInteger |
kAudioFormatFlagIsPacked;
audioFormat.mBytesPerPacket = 4;
audioFormat.mFramesPerPacket = 1;
audioFormat.mChannelsPerFrame = 2;
audioFormat.mBytesPerFrame = 4;
audioFormat.mBitsPerChannel = 16;
And configure an AudioBufferList like so....
AudioBufferList bufferList;
bufferList.mNumberBuffers = 1;
bufferList.mBuffers[0].mDataByteSize = bufferSize;
bufferList.mBuffers[0].mNumberChannels = audioFormat.mChannelsPerFrame;
bufferList.mBuffers[0].mData = buffer; //malloc(sizeof(UInt8) * 1024 * audioFormat.mBytesPerPacket)
How, then, is the data arranged in mData? If I iterate through the data like so
for (int i = 0; i < frameCount; i++) {
UInt8 somePieceOfAudioData = buffer[i];
}
then what is somePieceOfAudioData.
Is it a sample or a frame (left and right channels together)? If it's a sample then what channel is it a sample for? If for example it's a sample from the right channel, will buffer[i + 1] be a sample for the left channel?
Any ideas, links? Thank you!
Audio data is expected to be interleaved unless the kAudioFormatFlagIsNonInterleaved is set. I've found that for Core Audio questions the best source of documentation is usually the headers. CoreAudioTypes.h contains the following comment:
Typically, when an ASBD is being used, the fields describe the
complete layout of the sample data in the buffers that are represented
by this description - where typically those buffers are represented by
an AudioBuffer that is contained in an AudioBufferList.
However, when an ASBD has the kAudioFormatFlagIsNonInterleaved flag,
the AudioBufferList has a different structure and semantic. In this
case, the ASBD fields will describe the format of ONE of the
AudioBuffers that are contained in the list, AND each AudioBuffer in
the list is determined to have a single (mono) channel of audio data.
Then, the ASBD's mChannelsPerFrame will indicate the total number of
AudioBuffers that are contained within the AudioBufferList - where
each buffer contains one channel. This is used primarily with the
AudioUnit (and AudioConverter) representation of this list - and won't
be found in the AudioHardware usage of this structure.
In your particular case, the buffer will consist of interleaved shorts starting with the left channel.
Yeah, you're reading a frame and it's two 16-bit samples, Left and Right. (Actually, I'm not certain which is Left and which is Right. Hmmm.)
In addition to the header files, the class references built into Xcode are helpful. I find I'm using "option-click" and "command-click" in my code a lot when I'm sorting out these kinds of details. (for those new to Xcode.. these clicks get you the info and docs, and jump-to-source location, respectively.)
The upcoming book "Learning Core Audio: A Hands-on Guide to Audio Programming for Mac and iOS" by Kevin Avila and Chris Adamson does a nice job of explaining how all this works. It's available in "Rough Cut" form now at Safari Books Online:
http://my.safaribooksonline.com/book/audio/9780321636973
i am trying to take the audio buffer samples in real time( resolution of ms)
i am using this function, but it gives me error.
AudioBufferList *bufferList = NULL;
AudioBuffer audioBuffer = bufferList->mBuffers[0];
int bufferSize = audioBuffer.mDataByteSize / sizeof(SInt32);
SInt32 *frame = audioBuffer.mData;
SInt32 signalInput[22050];
for( int i=0; i<bufferSize; i++ )
{
SInt32 currentSample = frame[i];
*(signalInput +i) = currentSample;
NSLog(#"Total power was: %ld ",currentSample);
}
what am i doning wrong here ?
i only need to get the audio samples .i dont want 2 pages code(such as in the app doc)
thanks .
What you want is inconsistent with what you are trying to do. A NULL bufferlist can produce no samples.
You need the two+ pages of code to properly configure the Audio Session and the RemoteIO Audio Unit (etc.) in order to get what you are trying to get. Otherwise there are no samples. The phone won't even turn on audio recording or know how to set up the recording (there a bunches of options) before turning it on. Study the docs and deal with it.