I'm trying to do FFT on the iPhone, and I realised that I had not overlapped my input prior to windowing. I was wondering if anyone could give me some insight on to how to properly overlap my input buffer.
I am wanting to overlap bufferSamples by a factor of 4, and I understand that it is to be done using memove functions, but I can't figure out how to get it to work in regard to overlapping.
enum
{
frameSize = 2048,
overlap = 4,
range = 8192,
step = frameSize/overlap,
};
static COMPLEX_SPLIT A;
// For each sample in buffer...
for (int j = 0; j < audioBufferList.mNumberBuffers; j++)
{
// Declaring samples from audio buffer list
SInt16 *bufferSamples = (SInt16*)audioBufferList.mBuffers[j].mData;
// Overlapping here?
////////////////////////
//// vDSP FUNCTIONS ////
////////////////////////
// Creating Hann window function
for (int i = 0; i < frameSize; i++)
{
double window = 0.5 * (1.0 - cos((2.0 * M_PI * i) / (frameSize - 1)));
// Applying window to each sample
A.realp[i] = window * bufferSamples[i];
A.imagp[i] = 0;
}
// Further DSP...
To get an overlap factor of 4, you need to save the last 75% of the data that, before windowing, was input to the previous FFT. Then use that saved data as the first 75% of the current FFT, with only the last 25% from current or not yet used data. memmove can be used to copy data to and from the temporary save data buffers. Repeat as necessary to use up the data available.
I have 128-bit data in q-register. I want to sum the individual 16-bit block in this q-register to finally have a 16-bit final sum (any carry beyond 16-bit should be taken and added to the LSB of this 16-bit num).
what I want to achieve is:
VADD.U16 (some 16-bit variable) {q0[0] q0[1] q0[2] ......... q0[7]}
but using intrinsics,
would appreciate if someone could give me an algorithm for this.
I tried using pair-wise addition, but I'm ending up with rather a clumsy solution..
Heres how it looks:
int convert128to16(uint16x8_t data128){
uint16_t data16 = 0;
uint16x4_t ddata;
print16(data128);
uint32x4_t data = vpaddlq_u16(data128);
print32(data);
uint16x4_t data_hi = vget_high_u16(data);
print16x4(data_hi);
uint16x4_t data_low = vget_low_u16(data);
print16x4(data_low);
ddata = vpadd_u16( data_hi, data_low);
print16x4(ddata);
}
It's still incomplete and a bit clumsy.. Any help would be much appreciated.
You can use the horizontal add instructions:
Here is a fragment:
uint16x8_t input = /* load your data128 here */
uint64x2_t temp = vpaddlq_u32 (vpaddlq_u16 (input));
uint64x1_t result = vadd_u64 (vget_high_u64 (temp),
vget_low_u64 (temp));
// result now contains the sum of all 16 bit unsigned words
// stored in data128.
// to add the values that overflow from 16 bit just do another 16 bit
// horizontal addition and return the lowest 16 bit as the final result:
uint16x4_t w = vpadd_u16 (
vreinterpret_u16_u64 (result),
vreinterpret_u16_u64 (result));
uint16_t wrappedResult = vget_lane_u16 (w, 0);
I f your goal is to sum the 16 bit chunks (modulo 16 bit), the following fragment would do:
uin16_t convert128to16(uint16x8_t data128){
data128 += (data128 >> 64);
data128 += (data128 >> 32);
data128 += (data128 >> 16);
return data128 & 0xffff;
}
If I open an audio file with extended audio file services, using the following client data format...
AudioStreamBasicDescription audioFormat;
memset(&audioFormat, 0, sizeof(audioFormat));
audioFormat.mSampleRate = 44100.0;
audioFormat.mFormatID = kAudioFormatLinearPCM;
audioFormat.mFormatFlags = kAudioFormatFlagIsBigEndian |
kAudioFormatFlagIsSignedInteger |
kAudioFormatFlagIsPacked;
audioFormat.mBytesPerPacket = 4;
audioFormat.mFramesPerPacket = 1;
audioFormat.mChannelsPerFrame = 2;
audioFormat.mBytesPerFrame = 4;
audioFormat.mBitsPerChannel = 16;
And configure an AudioBufferList like so....
AudioBufferList bufferList;
bufferList.mNumberBuffers = 1;
bufferList.mBuffers[0].mDataByteSize = bufferSize;
bufferList.mBuffers[0].mNumberChannels = audioFormat.mChannelsPerFrame;
bufferList.mBuffers[0].mData = buffer; //malloc(sizeof(UInt8) * 1024 * audioFormat.mBytesPerPacket)
How, then, is the data arranged in mData? If I iterate through the data like so
for (int i = 0; i < frameCount; i++) {
UInt8 somePieceOfAudioData = buffer[i];
}
then what is somePieceOfAudioData.
Is it a sample or a frame (left and right channels together)? If it's a sample then what channel is it a sample for? If for example it's a sample from the right channel, will buffer[i + 1] be a sample for the left channel?
Any ideas, links? Thank you!
Audio data is expected to be interleaved unless the kAudioFormatFlagIsNonInterleaved is set. I've found that for Core Audio questions the best source of documentation is usually the headers. CoreAudioTypes.h contains the following comment:
Typically, when an ASBD is being used, the fields describe the
complete layout of the sample data in the buffers that are represented
by this description - where typically those buffers are represented by
an AudioBuffer that is contained in an AudioBufferList.
However, when an ASBD has the kAudioFormatFlagIsNonInterleaved flag,
the AudioBufferList has a different structure and semantic. In this
case, the ASBD fields will describe the format of ONE of the
AudioBuffers that are contained in the list, AND each AudioBuffer in
the list is determined to have a single (mono) channel of audio data.
Then, the ASBD's mChannelsPerFrame will indicate the total number of
AudioBuffers that are contained within the AudioBufferList - where
each buffer contains one channel. This is used primarily with the
AudioUnit (and AudioConverter) representation of this list - and won't
be found in the AudioHardware usage of this structure.
In your particular case, the buffer will consist of interleaved shorts starting with the left channel.
Yeah, you're reading a frame and it's two 16-bit samples, Left and Right. (Actually, I'm not certain which is Left and which is Right. Hmmm.)
In addition to the header files, the class references built into Xcode are helpful. I find I'm using "option-click" and "command-click" in my code a lot when I'm sorting out these kinds of details. (for those new to Xcode.. these clicks get you the info and docs, and jump-to-source location, respectively.)
The upcoming book "Learning Core Audio: A Hands-on Guide to Audio Programming for Mac and iOS" by Kevin Avila and Chris Adamson does a nice job of explaining how all this works. It's available in "Rough Cut" form now at Safari Books Online:
http://my.safaribooksonline.com/book/audio/9780321636973
I'm using AudioToolbox to access m4a audio files with following code:
UInt32 packetsToRead = 1; //Does it makes difference?
void *buffer = malloc(maxPacketSize * packetsToRead);
for (UInt64 packetIndex = 0; packetIndex < packetCount; packetIndex++)
{
ioNumberOfPackets = packetsToRead;
ioNumberOfBytes = maxPacketSize * ioNumberOfPackets;
AudioFileReadPacketData(audioFile, NO, &ioNumbersOfBytes, NULL, packetIndex, &ioNumberOFPackets, buffer);
for (UInt32 batchPacketIndex = 0; batchPacketIndex < ioNumberOfPackets; batchPacketIndex++)
{
//What to do here to get amplitude value? How to get sample value?
}
packetIndex+=ioNumberOfPackets;
}
My audio format is:
AppleM4A, 8000 Hz, 16 Bit, 4096 frames per packet
The solution was to use extended audio file services. You just have to set up transition between client format and PCM. Got the right way overthere Audio Processing: Playing with volume level.
To get waveform data, you may first need to convert your compressed audio file into raw PCM samples, such as found inside a WAV file, or other non-compressed audio format. Try AVAssetReader, et.al.
I was recently asked to complete a task for a c++ role, however as the application was decided not to be progressed any further I thought that I would post here for some feedback / advice / improvements / reminder of concepts I've forgotten.
The task was:
The following data is a time series of integer values
int timeseries[32] = {67497, 67376, 67173, 67235, 67057, 67031, 66951,
66974, 67042, 67025, 66897, 67077, 67082, 67033, 67019, 67149, 67044,
67012, 67220, 67239, 66893, 66984, 66866, 66693, 66770, 66722, 66620,
66579, 66596, 66713, 66852, 66715};
The series might be, for example, the closing price of a stock each day
over a 32 day period.
As stored above, the data will occupy 32 x sizeof(int) bytes = 128 bytes
assuming 4 byte ints.
Using delta encoding , write a function to compress, and a function to
uncompress data like the above.
Ok, so before this point I had never looked at compression so my solution is far from perfect. The manner in which I approached the problem is by compressing the array of integers into a array of bytes. When representing the integer as a byte I keep the calculate most
significant byte (msb) and keep everything up to this point, whilst throwing the rest away. This is then added to the byte array. For negative values I increment the msb by 1 so that we can
differentiate between positive and negative bytes when decoding by keeping the leading
1 bit values.
When decoding I parse this jagged byte array and simply reverse my
previous actions performed when compressing. As mentioned I have never looked at compression prior to this task so I did come up with my own method to compress the data. I was looking at C++/Cli recently, had not really used it previously so just decided to write it in this language, no particular reason. Below is the class, and a unit test at the very bottom. Any advice / improvements / enhancements will be much appreciated.
Thanks.
array<array<Byte>^>^ CDeltaEncoding::CompressArray(array<int>^ data)
{
int temp = 0;
int original;
int size = 0;
array<int>^ tempData = gcnew array<int>(data->Length);
data->CopyTo(tempData, 0);
array<array<Byte>^>^ byteArray = gcnew array<array<Byte>^>(tempData->Length);
for (int i = 0; i < tempData->Length; ++i)
{
original = tempData[i];
tempData[i] -= temp;
temp = original;
int msb = GetMostSignificantByte(tempData[i]);
byteArray[i] = gcnew array<Byte>(msb);
System::Buffer::BlockCopy(BitConverter::GetBytes(tempData[i]), 0, byteArray[i], 0, msb );
size += byteArray[i]->Length;
}
return byteArray;
}
array<int>^ CDeltaEncoding::DecompressArray(array<array<Byte>^>^ buffer)
{
System::Collections::Generic::List<int>^ decodedArray = gcnew System::Collections::Generic::List<int>();
int temp = 0;
for (int i = 0; i < buffer->Length; ++i)
{
int retrievedVal = GetValueAsInteger(buffer[i]);
decodedArray->Add(retrievedVal);
decodedArray[i] += temp;
temp = decodedArray[i];
}
return decodedArray->ToArray();
}
int CDeltaEncoding::GetMostSignificantByte(int value)
{
array<Byte>^ tempBuf = BitConverter::GetBytes(Math::Abs(value));
int msb = tempBuf->Length;
for (int i = tempBuf->Length -1; i >= 0; --i)
{
if (tempBuf[i] != 0)
{
msb = i + 1;
break;
}
}
if (!IsPositiveInteger(value))
{
//We need an extra byte to differentiate the negative integers
msb++;
}
return msb;
}
bool CDeltaEncoding::IsPositiveInteger(int value)
{
return value / Math::Abs(value) == 1;
}
int CDeltaEncoding::GetValueAsInteger(array<Byte>^ buffer)
{
array<Byte>^ tempBuf;
if(buffer->Length % 2 == 0)
{
//With even integers there is no need to allocate a new byte array
tempBuf = buffer;
}
else
{
tempBuf = gcnew array<Byte>(4);
System::Buffer::BlockCopy(buffer, 0, tempBuf, 0, buffer->Length );
unsigned int val = buffer[buffer->Length-1] &= 0xFF;
if ( val == 0xFF )
{
//We have negative integer compressed into 3 bytes
//Copy over the this last byte as well so we keep the negative pattern
System::Buffer::BlockCopy(buffer, buffer->Length-1, tempBuf, buffer->Length, 1 );
}
}
switch(tempBuf->Length)
{
case sizeof(short):
return BitConverter::ToInt16(tempBuf,0);
case sizeof(int):
default:
return BitConverter::ToInt32(tempBuf,0);
}
}
And then in a test class I had:
void CTestDeltaEncoding::TestCompression()
{
array<array<Byte>^>^ byteArray = CDeltaEncoding::CompressArray(m_testdata);
array<int>^ decompressedArray = CDeltaEncoding::DecompressArray(byteArray);
int totalBytes = 0;
for (int i = 0; i<byteArray->Length; i++)
{
totalBytes += byteArray[i]->Length;
}
Assert::IsTrue(m_testdata->Length * sizeof(m_testdata) > totalBytes, "Expected the total bytes to be less than the original array!!");
//Expected totalBytes = 53
}
This smells a lot like homework to me. The crucial phrase is: "Using delta encoding."
Delta encoding means you encode the delta (difference) between each number and the next:
67497, 67376, 67173, 67235, 67057, 67031, 66951, 66974, 67042, 67025, 66897, 67077, 67082, 67033, 67019, 67149, 67044, 67012, 67220, 67239, 66893, 66984, 66866, 66693, 66770, 66722, 66620, 66579, 66596, 66713, 66852, 66715
would turn into:
[Base: 67497]: -121, -203, +62
and so on. Assuming 8-bit bytes, the original numbers require 3 bytes apiece (and given the number of compilers with 3-byte integer types, you're normally going to end up with 4 bytes apiece). From the looks of things, the differences will fit quite easily in 2 bytes apiece, and if you can ignore one (or possibly two) of the least significant bits, you can fit them in one byte apiece.
Delta encoding is most often used for things like sound encoding where you can "fudge" the accuracy at times without major problems. For example, if you have a change from one sample to the next that's larger than you've left space to encode, you can encode a maximum change in the current difference, and add the difference to the next delta (and if you don't mind some back-tracking, you can distribute some to the previous delta as well). This will act as a low-pass filter, limiting the gradient between samples.
For example, in the series you gave, a simple delta encoding requires ten bits to represent all the differences. By dropping the LSB, however, nearly all the samples (all but one, in fact) can be encoded in 8 bits. That one has a difference (right shifted one bit) of -173, so if we represent it as -128, we have 45 left. We can distribute that error evenly between the preceding and following sample. In that case, the output won't be an exact match for the input, but if we're talking about something like sound, the difference probably won't be particularly obvious.
I did mention that it was an exercise that I had to complete and the solution that I received was deemed not good enough, so I wanted some constructive feedback seeing as actual companies never decide to tell you what you did wrong.
When the array is compressed I store the differences and not the original values except the first as this was my understanding. If you had looked at my code I have provided a full solution but my question was how bad was it?