I'm study FLAC decode problem, but can't figure out how to get FLAC frame length. Please help.
https://xiph.org/flac/format.html
I docoded METADATA_BLOCK_STREAMINFO, and get below data:
mMinBlock: 4096
mMaxBlock: 4096
mMinFrame: 1201
mMaxFrame: 12804
mSampleRate: 44100
mBitPerSample: 16
mTotalSample: 14170212
Then I start to analyse first Frame, below is the info from first Frame header:
isFixBlock = true
blockSize = 12
sampleRate = 9
channel = 10
sampleSize = 4
number = 0
Blocking strategy is fixed-blocksize;
Block size: 1100, it means 256 * (2^(12-8)) samples = 4096 samples;
Sample rate: 1001 : 44.1kHz;
Channel: 2;
Sample size: 100 : 16 bits per sample;
So from above infomation, we'll know this frame has 4096 samples, and sample size is 16 bits per sample. That means this frame length is at least(ignore subframe header and frame footer,etc.) 4096 * 16 / 8 = 8192 bytes. But if I check the FLAC file manually, the offset gap of first and second frame is only 2976 bytes, this means the frame length of first frame is only 2976 bytes. Is there anything wrong for my calculation?
My purpose is get frame offset and frame length of every frame, is there any good way? I know there is sync code 0xFF F8, but it's very low efficiency.
Thanks in advance!
From http://lists.xiph.org/pipermail/flac-dev/2016-February/005845.html
The frame length you calculated (8192 bytes) is that of the decoded
frame, not of the FLAC frame. As it is compressed, it should be indeed
smaller than 8192 bytes.
There is no direct way to find the frame length except finding where the
next frame starts.
Related
Im chanced upon this statement while studying the allocation of RAM in embedded device.
Basically suppose we have an image sensor that uses RGB 5-6-5 format. It captures an image size of 320x240. The author proceeds to use
"There are two 150-KB data buffers that contain the raw data from the image sensor (320x240 in the
RGB 5-6-5 format). "
Does anyone know how is two 150KB data buffers enough to store the raw image? How can i calculate the image bits?
I tried calculating
( 2^5 * 2^6 * 2^5 * 320 * 240 ) * 0.000125 = 629145.6 // in KB.
You should look closer at the definition of the RGB 5:6:5 format. Each color takes up 2 bytes (5 bits for red, 6 bits for green and 5 bits for blue; adding up to 16 bits == 2 bytes), so a raw 320x240 picture takes 320 * 240 * 2 bytes, i.e. 153600 bytes or 150 KB.
For an iCE40 1k device, Following is the snippet from the output of the command "iceunpack -vv example.bin"
I could not understand why there are 332x144 bits?
My understanding is that [1], the CRAM BLOCK[0] starts at the logic tile (1,1), and it should contain:
48 logic tiles, each 54x16,
14 IO tiles, each 18x16
How the "332 x 144" is calculated?
Where does the IO tile and logic tiles bits are mapped in CRAM BLOCK[0] bits?
e.g., which bits of CRAM BLOCK[0] indicates the bits for logic tile (1,1) and bits for IO tile (0,1)?
Set bank to 0.
Next command at offset 26: 0x01 0x01
CRAM Data [0]: 332 x 144 bits = 47808 bits = 5976 bytes
Next command at offset 6006: 0x11 0x01
[1]. http://www.clifford.at/icestorm/format.html
Thanks.
height=9x16=144 (1 I/O tile and 8 Logic tiles)
Width=18+42+5x54 = 330 (1 I/O tile, 1 ram tile and 5 Logic tiles) plus "two zero bytes" = 332.
I have some PCAPNG file, one UDP packet has Frame length 187 Bytes (1496 bits) and Data length 472 Bytes. All other packets fame length is greater than data length.
Please correct me if I'm wrong. My basic understanding is Frame length should be greater than data length because frame length includes data length.
1) Whether this packet is captured correctly ?
2) In which case this could happen ?
I found its related to fragmented packet. This case occurs when packet reassembled.
I've started editing the RaspiStillYUV.c code. I eventually want to process the image I receive, but for now, I'm just working to understand it. Why am I working with YUV instead of RGB? So I can learn something new. I've made minor changes to the function camera_buffer_callback. All I am doing is the following:
fprintf(stderr, "GREAT SUCCESS! %d\n", buffer->length);
The line this is replacing:
bytes_written = fwrite(buffer->data, 1, buffer->length, pData->file_handle);
Now, the dimensions should be 2592 x 1944 (w x h) as set in the code. Working off of Wikipedia (YUV420) I have come to the conclusion that the file size should be w * h * 1.5. Since the Y component has 1 byte of data for each pixel and the U and V components have 1 byte of data for every 4 pixels (1 + 1/4 + 1/4 = 1.5). Great. Doing the math in Python:
>>> 2592 * 1944 * 1.5
7558272.0
Unfortunately, this does not line up with the output of my program:
GREAT SUCCESS! 7589376
That leaves a difference of 31104 bytes.
I figure that the buffer is allocated in fixed size chunks (the output size is evenly divisible by 512). While I would like to understand that mystery, I'm fine with the fixed size chunk explanation.
My question is if I am missing something. Are the extra bytes beyond the expected size meaningful in this format? Should they be ignored? Are my calculations off?
The documentation at this location supports your theory on padding: http://www.raspberrypi.org/wp-content/uploads/2013/07/RaspiCam-Documentation.pdf
Specifically:
Note that the image buffers saved in raspistillyuv are padded to a
horizontal size divisible by 16 (so there may be unused bytes at the
end of each line to made the width divisible by 16). Buffers are also
padded vertically to be divisible by 16, and in the YUV mode, each
plane of Y,U,V is padded in this way.
So my interpretation of this is the following.
The width is 2592 (divisible by 16 so this is ok).
The height is 1944 which is 8 short of being divisible by 16 so an extra 8*2592 are added (also multiplied by 1.5) thus giving your 31104 extra bytes.
Although this kindof helps with the size of the file, it doesn't explain the structure of the YUV output properly. I am having a look at this description to see if this provides a hint to start with: http://en.wikipedia.org/wiki/YUV#Y.27UV420p_.28and_Y.27V12_or_YV12.29_to_RGB888_conversion
From this I believe it is as follows:
Y Channel:
2592 * (1944+8) = 5059584
U Channel:
1296 * (972+4) = 1264896
V Channel:
1296 * (972+4) = 1264896
Giving a sum of :
5059584 + 2*1264896 = 7589376
This makes the numbers add up so only thing left is to confirm if this interpretation is correct.
I am also trying to do the YUV decode (for image comparisons) so if you can confirm if this actually does correspond to what you are reading in the YUV file this would be much appreciated.
You have to read the manual carefully. Buffers are padded to multiples of 16, but colour data is half-size, so your image size needs to be in multiples of 32 to avoid problems with padding breaking external software.
I'm new to the iOS and its C underpinnings, but not to programming in general. My dilemma is this. I'm implementing an echo effect in a complex AudioUnits based application. The application needs reverb, echo, and compression, among other things. However, the echo only works right when I use a particular AudioStreamBasicDescription format for the audio samples generated in my app. This format however doesn't work with the other AudioUnits.
While there are other ways to solve this problem fixing the bit-twiddling in the echo algorithm might be the most straight forward approach.
The*AudioStreamBasicDescription* that works with echo has a mFormatFlag of: kAudioFormatFlagsAudioUnitCanonical; It's specifics are:
AudioUnit Stream Format (ECHO works, NO AUDIO UNITS)
Sample Rate: 44100
Format ID: lpcm
Format Flags: 3116 = kAudioFormatFlagsAudioUnitCanonical
Bytes per Packet: 4
Frames per Packet: 1
Bytes per Frame: 4
Channels per Frame: 2
Bits per Channel: 32
Set ASBD on input
Set ASBD on output
au SampleRate rate: 0.000000, 2 channels, 12 formatflags, 1819304813 mFormatID, 16 bits per channel
The stream format that works with AudioUnits is the same except for the mFormatFlag: kAudioFormatFlagIsFloat | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked | kAudioFormatFlagIsNonInterleaved -- Its specifics are:
AudioUnit Stream Format (NO ECHO, AUDIO UNITS WORK)
Sample Rate: 44100
Format ID: lpcm
Format Flags: 41
Bytes per Packet: 4
Frames per Packet: 1
Bytes per Frame: 4
Channels per Frame: 2
Bits per Channel: 32
Set ASBD on input
Set ASBD on output
au SampleRate rate: 44100.000000, 2 channels, 41 formatflags, 1819304813 mFormatID, 32 bits per channel
In order to create the echo effect I use two functions that bit-shift sample data into SInt16 space, and back. As I said, this works for the kAudioFormatFlagsAudioUnitCanonical, format but not the other. When it fails, the sounds are clipped and distorted, but they are there. I think this indicates that the difference between these two formats is how the data is arranged in the Float32.
// convert sample vector from fixed point 8.24 to SInt16
void fixedPointToSInt16( SInt32 * source, SInt16 * target, int length ) {
int i;
for(i = 0;i < length; i++ ) {
target[i] = (SInt16) (source[i] >> 9);
//target[i] *= 0.003;
}
}
*As you can see I tried modifying the amplitude of the samples to get rid of the clipping -- clearly that didn't work.
// convert sample vector from SInt16 to fixed point 8.24
void SInt16ToFixedPoint( SInt16 * source, SInt32 * target, int length ) {
int i;
for(i = 0;i < length; i++ ) {
target[i] = (SInt32) (source[i] << 9);
if(source[i] < 0) {
target[i] |= 0xFF000000;
}
else {
target[i] &= 0x00FFFFFF;
}
}
}
If I can determine the difference between kAudioFormatFlagIsFloat | kAudioFormatFlagsNativeEndian | kAudioFormatFlagIsPacked | kAudioFormatFlagIsNonInterleaved, then I can modify the above methods accordingly. But I'm not sure how to figure that out. Documentation in CoreAudio is enigmatic, but from what I've read there, and gleaned from the CoreAudioTypes.h file, both mFormatFlag(s) refer to the same Fixed Point 8.24 format. Clearly something is different, but I can't figure out what.
Thanks for reading through this long question, and thanks in advance for any insight you can provide.
kAudioFormatFlagIsFloat means that the buffer contains floating point values. If mBitsPerChannel is 32 then you are dealing with float data (also called Float32), and if it is 64 you are dealing with double data.
kAudioFormatFlagsNativeEndian refers to the fact that the data in the buffer matches the endianness of the processor, so you don't have to worry about byte swapping.
kAudioFormatFlagIsPacked means that every bit in the data is significant. For example, if you store 24 bit audio data in 32 bits, this flag will not be set.
kAudioFormatFlagIsNonInterleaved means that each individual buffer consists of one channel of data. It is common for audio data to be interleaved, with the samples alternating between L and R channels: LRLRLRLR. For DSP applications it is often easier to deinterleave the data and work on one channel at a time.
I think in your case the error is that you are treating floating point data as fixed point. Float data is generally scaled to the interval [-1, +1). To convert float to SInt16 you need to multiply each sample by the maximum 16-bit value (1u << 15, 32768) and then clip to the interval [-32768, 32767].