How to write the avc1 atom with libavcodec for MP4 file using H264 codec - header

I am trying to create an MP4 file using libavcodec. I am using a raspberry pi which has a built in hardware H264 encoder. It outputs Annex B H264 frames and I am trying to see the proper way to save these frames into an MP4 container.
My first attempt simply wrote the MP4 header without building the extradata. The raspberry pi transmits as first frame the SPS and PPS info. This is followed by IDR and then the remaining H264 frames. I started with avformat_write_header and then repackaged the succeeding frames in AVPacket and used
av_write_frame(outputFormatCtx, &pkt);
This works fine but mplayer tries to decode the first frame ( the one containing SPS and PPS info ) and fails with decoding that frame. However, succeeding frames are decodable and the video plays fine from that point on.
I wanted to construct a proper MP4 file so I wanted the SPS and PPS information to go the MP4 header. I read that it should be in the avc1 atom and that I needed to build the extradata and somehow link it to the outputformatctx.
This is my effort so far, after parsing sps and pps from the returned encoder buffers. (I removed the leading 0x0000001 nal delimiters prior to memcpying to sps and pps).
if ((sps) && (pps)) {
//length of extradata is 6 bytes + 2 bytes for spslen + sps + 1 byte number of pps + 2 bytes for ppslen + pps
uint32_t extradata_len = 8 + spslen + 1 + 2 + ppslen;
outputStream->codecpar->extradata = (uint8_t*)av_mallocz(extradata_len);
outputStream->codecpar->extradata_size = extradata_len;
//start writing avcc extradata
outputStream->codecpar->extradata[0] = 0x01; //version
outputStream->codecpar->extradata[1] = sps[1]; //profile
outputStream->codecpar->extradata[2] = sps[2]; //comatibility
outputStream->codecpar->extradata[3] = sps[3]; //level
outputStream->codecpar->extradata[4] = 0xFC | 3; // reserved (6 bits), NALU length size - 1 (2 bits) which is 3
outputStream->codecpar->extradata[5] = 0xE0 | 1; // reserved (3 bits), num of SPS (5 bits) which is 1 sps
//write sps length
memcpy(&outputStream->codecpar->extradata[6],&spslen,2);
//Check to see if written correctly
uint16_t *cspslen=(uint16_t *)&outputStream->codecpar->extradata[6];
fprintf(stderr,"SPS length Wrote %d and read %d \n",spslen,*cspslen);
//Write the actual sps
int i = 0;
for (i=0; i<spslen; i++) {
outputStream->codecpar->extradata[8 + i] = sps[i];
}
for (size_t i = 0; i != outputStream->codecpar->extradata_size; ++i)
fprintf(stderr, "\\%02x", (unsigned char)outputStream->codecpar->extradata[i]);
fprintf(stderr,"\n");
//Number of pps
outputStream->codecpar->extradata[8 + spslen] = 0x01;
//Size of pps
memcpy(&outputStream->codecpar->extradata[8+spslen+1],&ppslen,2);
for (size_t i = 0; i != outputStream->codecpar->extradata_size; ++i)
fprintf(stderr, "\\%02x", (unsigned char)outputStream->codecpar->extradata[i]);
fprintf(stderr,"\n");
//Check to see if written correctly
uint16_t *cppslen=(uint16_t *)&outputStream->codecpar->extradata[+8+spslen+1];
fprintf(stderr,"PPS length Wrote %d and read %d \n",ppslen,*cppslen);
//Write actual PPS
for (i=0; i<ppslen; i++) {
outputStream->codecpar->extradata[8 + spslen + 1 + 2 + i] = pps[i];
}
//Output the extradata to check
for (size_t i = 0; i != outputStream->codecpar->extradata_size; ++i)
fprintf(stderr, "\\%02x", (unsigned char)outputStream->codecpar->extradata[i]);
fprintf(stderr,"\n");
//Access the outputFormatCtx internal AVCodecContext and copy the codecpar to it
AVCodecContext *avctx= outputFormatCtx->streams[0]->codec;
fprintf(stderr,"Extradata size output stream sps pps %d\n",outputStream->codecpar->extradata_size);
if(avcodec_parameters_to_context(avctx, outputStream->codecpar) < 0 ){
fprintf(stderr,"Error avcodec_parameters_to_context");
}
//Check to see if extradata was actually transferred to OutputformatCtx internal AVCodecContext
fprintf(stderr,"Extradata size after sps pps %d\n",avctx->extradata_size);
//Write the MP4 header
if(avformat_write_header(outputFormatCtx , NULL) < 0){
fprintf(stderr,"Error avformat_write_header");
ret = 1;
} else {
extradata_written=true;
fprintf(stderr,"EXTRADATA written\n");
}
}
The resulting video file does not play. The extradata is actually stored in the tail section of the MP4 file instead of the location in the MP4 header for avc1. So it is being written by libavcodec but written likely by avformat_write_trailer.
I will post the PPS and SPS info here and the final extradata byte string just in case the error was in forming the extradata.
Here is the buffer from the hardware encoder with sps and pps preceded by the nal delimiter
\00\00\00\01\27\64\00\28\ac\2b\40\a0\cd\00\f1\22\6a\00\00\00\01\28\ee\04\f2\c0
Here is the 13 byte sps:
27640028ac2b40a0cd00f1226a
Here is the 5 byte pps:
28ee04f2c0
Here is the final extradata byte string which is 29 bytes long. I hope I wrote the PPS and SPS size correctly.
\01\64\00\28\ff\e1\0d\00\27\64\00\28\ac\2b\40\a0\cd\00\f1\22\6a\01\05\00\28\ee\04\f2\c0
I did the same conversion from NAL delimiter 0x0000001 to 4 byte NAL size for the succeeding frames from the encoder and saved them to the file sequentially and then wrote the trailer.
Any idea where the mistake is? How can I write the extradata to its proper location in the MP4 header?
Thanks,
Chris

Well, I found the problem. The raspberry pi is little endian so I assumed that I must write the sps length and pps length and each NALU size in little endian. They need to be written in big endian. After I made the change, the avcc atom showed in mp4info and mplayer can now playback the video.
It's not necessary to access the outputformatctx internal avcodeccontext and modify it.
This post was very helpful:
Possible Locations for Sequence/Picture Parameter Set(s) for H.264 Stream
Thanks,
Chris

Related

Distortion in ESP32 I2S audio playback with external DAC for sample frequency higher than 20kSps

Hardware: ESP32 DevKitV1, PCM5102 breakout board, SD-card adapter.
Software: Arduino framework.
For some time I am struggling with audio playback using a I2S DAC external to ESP32.
The problem is I can only play without distortion for low sample frequencies, i.e. below 20kSps.
I have been studying the documentation, https://docs.espressif.com/projects/esp-idf/en/latest/api-reference/peripherals/i2s.html, and numerous other sources but sill haven't managed to fix this.
I2S configuration function:
esp_err_t I2Smixer::i2sConfig(int bclkPin, int lrckPin, int dinPin, int sample_rate)
{
// i2s configuration: Tx to ext DAC, 2's complement 16-bit PCM, mono,
const i2s_config_t i2s_config = {
.mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_TX | I2S_CHANNEL_MONO), // only tx, external DAC
.sample_rate = sample_rate,
.bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
.channel_format = I2S_CHANNEL_FMT_ONLY_RIGHT, // single channel
// .channel_format = I2S_CHANNEL_FMT_RIGHT_LEFT, //2-channels
.communication_format = (i2s_comm_format_t)(I2S_COMM_FORMAT_I2S | I2S_COMM_FORMAT_I2S_MSB),
.intr_alloc_flags = ESP_INTR_FLAG_LEVEL3, // highest interrupt priority that can be handeled in c
.dma_buf_count = 128, //16,
.dma_buf_len = 128, // 64
.use_apll = false,
.tx_desc_auto_clear = true};
const i2s_pin_config_t pin_config = {
.bck_io_num = bclkPin, //this is BCK pin
.ws_io_num = lrckPin, // this is LRCK pin
.data_out_num = dinPin, // this is DATA output pin
.data_in_num = I2S_PIN_NO_CHANGE // Not used
};
esp_err_t ret1 = i2s_driver_install((i2s_port_t)i2s_num, &i2s_config, 0, NULL);
esp_err_t ret2 = i2s_set_pin((i2s_port_t)i2s_num, &pin_config);
esp_err_t ret3 = i2s_set_sample_rates((i2s_port_t)i2s_num, sample_rate);
// i2s_adc_disable((i2s_port_t)i2s_num);
// esp_err_t ret3 = rtc_clk_apll_enable(1, 15, 8, 5, 6);
return ret1 + ret2 + ret3;
}
A wave file, which was created in a 16 bit mono PCM, 44.1kHz format, is opened:
File sample_file = SD.open("/test.wav")
In the main loop, the samples are fed to the I2S driver.
esp_err_t I2Smixer::loop()
{
esp_err_t ret1 = ESP_OK, ret2 = ESP_OK;
int32_t output = 0;
if (sample_file.available())
{
if (sample_file.size() - sample_file.position() > 2) // bytes left
{
int16_t tmp; // 16 bits signed PCM assumed
sample_file.read((uint8_t *)&tmp, 2);
output =(int32_t)tmp;
}
else
{
sample_file.close();
}
}
size_t i2s_bytes_write;
int16_t int16_t_output = (int16_t)output;
ret1 = i2s_write((i2s_port_t)i2s_num, &int16_t_output, 2, &i2s_bytes_write, portMAX_DELAY);
if (i2s_bytes_write != 2)
ret2 = ESP_FAIL;
return ret1 + ret2;
}
This works fine for sample rates up to 20 kSps.
For a sample rate of 32k or 44.1k heavy distortion occurs. I suspect that this is caused by the I2S DMA Tx buffer.
If the number of DMA buffers (dma_buf_count) and the buffer length (dma_buf_len) is increased, then the sound is played fine at first. Subsequently, after a short time, the distortion kicks in again. I cannot measure this short time span, maybe around a second, but I did notice it depends on the dma_buf_count and dma_buf_len.
Next to this, I tried increasing the CPU frequency to 240MHz, no improvement.
Further I tried to play a file from SPIFSS, no improvement.
I am out of ideas right now, has anyone encountered this issue also?
Reading one sample at a time and pushing it to the I2S driver will not be the most efficient usage of the driver. You are using just 2 bytes in every 128 byte DMA buffer. That leaves just a single sample period to push the next sample before the DMA buffer is "starved".
Read the file in 128 byte (64 sample) chunks and write the whole chunk to the I2S in order to use the DMA effectively.
Depending on the file-system implementation it may be a little more efficient too to use larger chunks that are sympathetic to the file-system's media, sector size and DMA buffering.

Ambiguous process calcChecksum

CONTEXT
I'm using a code written to work with a GPS module that connects to the Arduino through serial communication. The module starts each packet with a header (0xb5, 0x62), continues with the information you requested and ends with to bytes of checksum, CK_A, and CK_B. I don't understand the code that calculates that checksum. More info about the algorithm of checksum (8-Bit Fletcher Algorithm) in the module protocol (https://www.u-blox.com/sites/default/files/products/documents/u-blox7-V14_ReceiverDescriptionProtocolSpec_%28GPS.G7-SW-12001%29_Public.pdf), page 74 (87 with index).
MORE INFO
Just wanted to understand the code, it works fine. In the UBX protocol, I mentioned there is also a piece of code that explains how it works (isn't write in c++)
struct NAV_POSLLH {
//Here goes the struct
};
NAV_POSLLH posllh;
void calcChecksum(unsigned char* CK) {
memset(CK, 0, 2);
for (int i = 0; i < (int)sizeof(NAV_POSLLH); i++) {
CK[0] += ((unsigned char*)(&posllh))[i];
CK[1] += CK[0];
}
}
In the link you provide, you can find a link to RFC 1145, containing that Fletcher 8 bit algorithm as well and explaining
It can be shown that at the end of the loop A will contain the 8-bit
1's complement sum of all octets in the datagram, and that B will
contain (n)*D[0] + (n-1)*D[1] + ... + D[n-1].
n = sizeof byte D[];
Quote adjusted to C syntax
Try it with a couple of bytes, pen and paper, and you'll see :)

How do I send a payload to an XBee

I have implemented the following code on an embedded platform that attempts to communicate with an XBee. The embedded platform that executes the code below is not an xbee:
int main()
{
char payload[12] = {0x61,0x88,0x00,0x64,0x00,0x00,0x00,0x00,0x00,0xEC,0x00,0x00}
payload[2] = 0x10;
payload[9] = 0x01;
char data = 'H'; // Send simple ASCII character to XBee
payload[11]= data;
while (1)
sendByteofData(payload,12);
}
void sendByteOfData(char * payload, int len)
{
int x;
for (x=0;x<4;x++)
// This function sends IEEE 802.15.4 frames, and I know it
// works because they are detected in the [sniffer][3].
send_IEEE_802_15_4_frame(payload,len);
}
payload[2] = payload[2] % 256 + 1;
payload[9] = payload[9] % 256 + 1;
if (payload[9] % 256 == 0 )
payload[9] = 0x01;
else
payload[9] %= 256;
}
To my surprise the above code actually sent one byte from the embedded platform to the XBee successfully. however, the infinite loop at the end of main() should have produced a stream of bytes.
My suspicion is I need to set payload[2] and payload[9] correctly, and there is probably a flaw in the incremental modulo 256 algorithm shown above.
How do I get a continuous stream of bytes?
A few thoughts...
Make your payload an array of unsigned char or, even better, uint8_t.
To update payload[2] and payload[9], simplify your code:
++payload[2];
++payload[9];
if (payload[9] == 0) payload[9] = 1;
Add a delay between sends. You might even need to wait for a response before sending the next character.
Since it's a payload of unsigned 8-bit values, they'll automatically roll from 255 to 0. I assume your special case code for payload[9] is trying to roll from 255 to 1 (instead of 0).
Make sure your payload doesn't need to include a checksum of some sort. Updating those two bytes would have an affect on a checksum byte.

How to make sense of CoreBluetooth data

I've been playing around with CoreBluetooth a lot lately, and although I can connect to some devices, I never seem to be able to properly read the data (characteristic values).
Right now I am connecting with the Wahoo BT Heartrate monitor, and I am getting all the signals but I can't make the data into anything sensible. (Yes, I am aware there is an API, but I am trying to connect without it, to properly get something working with CoreBluetooth).
I have so far not been able to turn the NSData (characteristic.value) into anything sensible. If you have any suggestions on how to make sense of this data that would be very much apreciated.
Below some code to fully parse all the HeartRate measurement characteristic data.
How to process the data depends on several things:
is the BPM written into a single byte or two?
is there EE data present?
calculate the number of RR-interval values, as there can be multiple values within in one message (I have seen up to three).
Here is the actual spec of the Heart_rate_measurement characteristic
// Instance method to get the heart rate BPM information
- (void) getHeartBPMData:(CBCharacteristic *)characteristic error:(NSError *)error
{
// Get the BPM //
// https://developer.bluetooth.org/gatt/characteristics/Pages/CharacteristicViewer.aspx?u=org.bluetooth.characteristic.heart_rate_measurement.xml //
// Convert the contents of the characteristic value to a data-object //
NSData *data = [characteristic value];
// Get the byte sequence of the data-object //
const uint8_t *reportData = [data bytes];
// Initialise the offset variable //
NSUInteger offset = 1;
// Initialise the bpm variable //
uint16_t bpm = 0;
// Next, obtain the first byte at index 0 in the array as defined by reportData[0] and mask out all but the 1st bit //
// The result returned will either be 0, which means that the 2nd bit is not set, or 1 if it is set //
// If the 2nd bit is not set, retrieve the BPM value at the second byte location at index 1 in the array //
if ((reportData[0] & 0x01) == 0) {
// Retrieve the BPM value for the Heart Rate Monitor
bpm = reportData[1];
offset = offset + 1; // Plus 1 byte //
}
else {
// If the second bit is set, retrieve the BPM value at second byte location at index 1 in the array and //
// convert this to a 16-bit value based on the host’s native byte order //
bpm = CFSwapInt16LittleToHost(*(uint16_t *)(&reportData[1]));
offset = offset + 2; // Plus 2 bytes //
}
NSLog(#"bpm: %i", bpm);
// Determine if EE data is present //
// If the 3rd bit of the first byte is 1 this means there is EE data //
// If so, increase offset with 2 bytes //
if ((reportData[0] & 0x03) == 1) {
offset = offset + 2; // Plus 2 bytes //
}
// Determine if RR-interval data is present //
// If the 4th bit of the first byte is 1 this means there is RR data //
if ((reportData[0] & 0x04) == 0)
{
NSLog(#"%#", #"Data are not present");
}
else
{
// The number of RR-interval values is total bytes left / 2 (size of uint16) //
NSUInteger length = [data length];
NSUInteger count = (length - offset)/2;
NSLog(#"RR count: %lu", (unsigned long)count);
for (int i = 0; i < count; i++) {
// The unit for RR interval is 1/1024 seconds //
uint16_t value = CFSwapInt16LittleToHost(*(uint16_t *)(&reportData[offset]));
value = ((double)value / 1024.0 ) * 1000.0;
offset = offset + 2; // Plus 2 bytes //
NSLog(#"RR value %lu: %u", (unsigned long)i, value);
}
}
}
Well, you should be implementing the Heart Rate Profile (see here), which uses the Heart Rate Service. If you look at the Heart Rate Service Specifications, you will see that the format of the Heart Rate Measurement Characteristic changes according to the flags set in the least significant octet of the data packet.
This means that you need to set up your code to handle dynamic packet sizes.
So your general process would be:
Get the first byte of the value property and check it for:
Is the heart rate measurement 8 bits or 16 bits?
Is sensor contact supported?
Is sensor contact detected?
Is Energy Expended supported?
Is RR-Interval measurement supported?
If the heart rate measurement is 8 bits (bit 0 of byte 0 is 0), then cast the next byte into its intended format (hint: it's uint8_t). If it is 16 bits (i.e. bit 0 of byte 0 is 1), then cast the next two bytes into uint16_t.
If Energy Expended is supported (bit 3 of byte 0 is 1), then cast the next byte into a uint16_t.
Do the same with RR-Intervals.
Using NSData - especially with Core Bluetooth - takes some getting used to, but it's not that bad once you grasp the concept.
Good luck!
Well...
What you'll have to do, when you read the value for the characteric :
NSData *data = [characteritic value];
theTypeOfTheData value;
[data getByte:&value lenght:sizeof(value)];
But, theTypeOfTheData can be whatever has thought the developer of the device. So it could be UInt8, UInt16, or struct (with or without bitfield)... You'll have to get info by contacting the developer of the device or looking if there is some documentation.
For example, with my colleague, we use the the type of data that consumes the less space (because the device hasn't infinite memory) according to what is need.
Look into the descriptor of the characteristic. It may points out the type of data.

Getting raw sample data of m4a file to draw waveform

I'm using AudioToolbox to access m4a audio files with following code:
UInt32 packetsToRead = 1; //Does it makes difference?
void *buffer = malloc(maxPacketSize * packetsToRead);
for (UInt64 packetIndex = 0; packetIndex < packetCount; packetIndex++)
{
ioNumberOfPackets = packetsToRead;
ioNumberOfBytes = maxPacketSize * ioNumberOfPackets;
AudioFileReadPacketData(audioFile, NO, &ioNumbersOfBytes, NULL, packetIndex, &ioNumberOFPackets, buffer);
for (UInt32 batchPacketIndex = 0; batchPacketIndex < ioNumberOfPackets; batchPacketIndex++)
{
//What to do here to get amplitude value? How to get sample value?
}
packetIndex+=ioNumberOfPackets;
}
My audio format is:
AppleM4A, 8000 Hz, 16 Bit, 4096 frames per packet
The solution was to use extended audio file services. You just have to set up transition between client format and PCM. Got the right way overthere Audio Processing: Playing with volume level.
To get waveform data, you may first need to convert your compressed audio file into raw PCM samples, such as found inside a WAV file, or other non-compressed audio format. Try AVAssetReader, et.al.