How to find the first NALunit of one complete video frame in raw HEVC/H.265 stream? - hevc

In a raw HEVC/H265 elementary stream, how to find the first NALunit of one video frame? Access unit delimiter/access_unit_delimiter_rbsp() seems to be a good choice, but it is optional in the video stream.

I think you should read the hevc specification . I never did but I can help a little by my experience in the HM codec.
in the beginning the current pos in bitstream is 0, the decoder extracts and discards all bytes ( leading_zero_8bits ) until he finds 0x00000001 or 0x000001 then he discard 3 or 4 bytes mentioned before ( zero_byte and start_code_prefix_one_3bytes ), then he reads all 3 bytes ( in loop ) until he finds 0x000003 or the end of the stream , this is the actual data . then if it’s not end of stream and not 0x00000001 or 0x00001 he will discard all bytes (trailing_zero_8bits) until he finds them , which means new NALu or end of stream. However, a frame could be represented with more then 1 NALu.
After that he convert the payload To RBSP by clearing EmulationPreventionByte and removing cabac_zero_word before the decoding starts.

Related

A few questions about the startcode of NALU

I am a beginner to study MPEG4, and there are some definitions that confuse me.
It is said if a NALU slice is the first slice of a frame, then the startcode of NALU is 4 bytes "\x00\x00\x00\x01", otherwise it is 3 bytes "\x00\x00\x01". I want to know is that mandatory? I find it seems always 4 bytes used in Android MPEG4Writer.
Is it possible that a NALU slice ends with "\x00", if so, how can we determine this "\x00" belongs to the preceding NALU or the following NALU?
No. 3 byte start codes are not required. But can be used to save a little space.
No. Every NALU has a stop bit. So the last byte is guaranteed to never be 0.

Creating a WAV file with an arbitrary bits per sample value?

Do WAV files allow any arbitrary number of bitsPerSample?
I have failed to get it to work with anything less than 8. I am not sure how to define the blockAlign for one thing.
Dim ss As New Speech.Synthesis.SpeechSynthesizer
Dim info As New Speech.AudioFormat.SpeechAudioFormatInfo(AudioFormat.EncodingFormat.Pcm, 5000, 4, 1, 2500, 1, Nothing) ' FAILS
ss.SetOutputToWaveFile("TEST4bit.wav", info)
ss.Speak("I am 4 bit.")
My.Computer.Audio.Play("TEST4bit.wav")
AFAIK no, 4-bit PCM format is undefined, it wouldn't make much sense to have 16 volume levels of audio; quality would be horrible.
While technically possible, I know no decent software (e.g. Wavelab) that supports it, your very own player could though.
Formula: blockAlign = channels * (bitsPerSample / 8)
So for a mono 4-bit it would be : blockAlign = 1 * ((double)4 / 8) = 0.5
Note the usage of double being necessary to not end up with 0.
But if you look at the block align definition below, it really does not make much sense to have an alignment of 0.5 bytes, one would have to work at the bit-level (painful and useless because at this quality, non-compressed PCM would just sound horrible):
wBlockAlign
The block alignment (in bytes) of the waveform data. Playback
software needs to process a multiple of wBlockAlign bytes of data at
a time, so the value of wBlockAlign can be used for buffer
alignment.
Reference:
http://www-mmsp.ece.mcgill.ca/Documents/AudioFormats/WAVE/Docs/riffmci.pdf page 59
Workaround:
If you really need 4-bit, switch to ADPCM format.

What does packing nal packets mean?

I have been trying to use the information from this question to solve a similar problem.
However, from the answer; I am not sure what is meant by the following:
I was not packing the raw NAL data correctly (not sure where this is
documented, if anywhere).
or even the solution to this packing issue.
To solve #2, through trial and error I found that giving my NAL units
in the following form worked:
[7 8 5] [1] [1] [1]..... [7 8 5] [1] [1] [1]..... (repeating)
Where each NAL unit is prefixed by a 32-bit start code equaling
0x00000001.
I have seen similar expression concerning nal packets. The original post from the link above has a statement that says:
My stream of NALs contains only SPS/PPS/IDR/P NALs (1, 5, 7, 8)
Again, what does this mean? How would I pack raw NAL data correctly in objective-c?
Any help would be greatly appreciated.
It is called H.264 Annex B byte stream format (defined in ISO/IEC 14496-12).
http://wiki.multimedia.cx/?title=H.264
I think, below page has very good explanation:
http://gentlelogic.blogspot.kr/2011/11/exploring-h264-part-2-h264-bitstream.html
There are many open source implementations about this (not easily reusable though).
NAL AU is the unit of data to encapsulate an encoded frame data. It is consist of header (start code), type, length, and body part.
SPS/PPS/IDR/P are frame types.
SPS : config information about overall stream (encoding method, parameter, etc)
PPS : config information about pictures (width, height, etc)
IDR : special frame/packet to setup decoder
P : usual encoded frame data
Order of frame in ordinary movie files: SPS (once) PPS (once) IDR (periodically) P (actual picture) P P P P P IDR P P P P P P P ...
For annex B byte stream processing, Intel IPP code samples are very good reference (umc_h264_nal_spl.cpp).
Its currently free to download (latest version is free to evaluate 30 days).
https://software.intel.com/en-us/articles/code-samples-for-intel-integrated-performance-primitives-intel-ipp-library-71
Annex B byte stream format describes how to store H.264 encoded frame data in a media container (such as mp4, MPEG 2 TS). Handling the container format binary data also requires many hard works. Each container uses different mechanism to specify codec configuration. As mentioned related SO post (AVAssetWriterInput H.264 Passthrough to QuickTime (.mov) - Passing in SPS/PPS to create avcC atom?), mp4/mov uses avcc box format which is defined in other ISO/IEC document.

File (.wav) duration while writing PCM data #16KBps

I am writing some silent PCM data on a file #16KBps. This file is of .wav format. For this I have the following code:
#define DEFAULT_BITRATE 16000
long LibGsmManaged:: addSilence ()
{
char silenceBuf[DEFAULT_BITRATE];
if (fout) {
for (int i = 0; i < DEFAULT_BITRATE; i++) {
silenceBuf[i] = '\0';
}
fwrite(silenceBuf, sizeof(silenceBuf), 1, fout);
}
return ftell(fout);
}
Updated:
Here is how I write the header
void LibGsmManaged::write_wave_header( )
{
if(fout) {
fwrite("RIFF", 4, 1, fout);
total_length_pos = ftell(fout);
write_int32(0);
fwrite("WAVE", 4, 1, fout);
fwrite("fmt ",4, 1, fout);
write_int32(16);
write_int16(1);
write_int16(1);
write_int32(8000);
write_int32(16000);
write_int16(2);
write_int16(16);
fwrite("data",4,1,fout);
data_length_pos = ftell(fout);
write_int32(0);
}
else {
std::cout << "File pointer not correctly initialized";
}
}
void LibGsmManaged::write_int32( int value)
{
if(fout) {
fwrite( (const char*)&value, sizeof(value), 1, fout);
}
else {
std::cout << "File pointer not correctly initialized";
}
}
I run this code on my iOS device using NSTimer with interval 1.0 sec. So AFAIK, if I run this for 60 sec, I should get a file.wav that when played should show 60 sec as its duration (again AFAIK). But in actual test it displays almost double duration i.e. 2 min. (approx). I have also tested that when I change the DEFAULT_BITRATE to 8000, then the file duration is almost correct.
I am unable to identify what is going on here. Am I missing something bad here? I hope my code is not wrong.
What you're trying to do (write your own WAV files) should be totally doable. That's the good news. However, I'm a bit confused about your exact parameters and constraints, as are many others in the comments, which is why they have been trying to flesh out the details.
You want to write raw, uncompressed, silent PCM to a WAV file. Okay. How wide does the PCM data need to be? You are creating an array of chars that you are writing to the file. A char is an 8-bit byte. Is that what you want? If so, then you need to use a silent center point of 0x80 (128). 8-bit PCM in WAV files is unsigned, i.e., 0..255, and 128 is silent.
If you intend to store silent 16-bit data, that will be signed data, so the center point (between -32768 and 32767) is 0. Also, it will be stored in little endian byte format. But since it's silence (all 0s), that doesn't matter.
The title of your question indicates (and the first sentence reiterates) that you want to write data at 16 kbps. Are you sure you want raw 16 kbps audio? That's 16 kiloBITs per second, or 16000 bits per second. Depending on whether you are writing 8- or 16-bit PCM samples, that only allows for 2000 or 1000 Hz audio, which is probably not what you want. Did you mean 16 kHz audio? 16 kHz audio translates to 16000 audio samples per second, which more closely aligns with your code. Then again, your code mentions GSM (LibGsmManaged), so maybe you are looking for 16 kbps audio. But I'll assume we're proceeding along the raw PCM route.
Do you know in advance how many seconds of audio you need to write? That makes this process really easy. As you may have noticed, the WAV header needs length information in a few spots. You either write it in advance (if you know the values) or fill it in later (if you are writing an indeterminate amount).
Let's assume you are writing 2 seconds of raw, monophonic, 16000 Hz, 16-bit PCM to a WAV file. The center point is 0x0000.
WAV writing process:
Write 'RIFF'
Write 32-bit file size, which will be 36 (header size - first 8 bytes) + 64000 (see step 12 about that number)
Write 'WAVEfmt ' (with space)
Write 32-bit format header size (16)
Write 16-bit audio format (1 indicating raw PCM audio)
Write 16-bit channel count (1 because it's monophonic)
Write 32-bit sample rate (number of audio sample per second = 16000)
Write 32-bit byte rate (number of bytes per second = 32000)
Write 16-bit block alignment (2 bytes per sample * 1 channel = 2)
Write 16-bit bits per sample (16)
Write 'data'
Write 32-bit length of audio payload data (16000 samples/second * 2 bytes/sample * 2 seconds = 64000 bytes)
Write 64000 bytes, all 0 values
If you need to write a dynamic amount of audio data, leave the length field from steps 2 and 12 as 0, then seek back after you're done writing and fill those in. I'm not convinced that your original code was writing the length fields correctly. Some playback software might ignore those, others might not, so you could have gotten varying results.
Hope that helps! If you know Python, here's another question I answered which describes how to write a WAV file using Python's struct library (I referred to that code fragment a lot while writing the steps above).

Publishing a stream using librtmp in C/C++

How to publish a stream using librtmp library?
I read the librtmp man page and for publishing , RTMP_Write() is used.
I am doing like this.
//Code
//Init RTMP code
RTMP *r;
char uri[]="rtmp://localhost:1935/live/desktop";
r= RTMP_Alloc();
RTMP_Init(r);
RTMP_SetupURL(r, (char*)uri);
RTMP_EnableWrite(r);
RTMP_Connect(r, NULL);
RTMP_ConnectStream(r,0);
Then to respond to ping/other messages from server, I am using a thread to respond like following:
//Thread
While (ThreadIsRunning && RTMP_IsConnected(r) && RTMP_ReadPacket(r, &packet))
{
if (RTMPPacket_IsReady(&packet))
{
if (!packet.m_nBodySize)
continue;
RTMP_ClientPacket(r, &packet); //This takes care of handling ping/other messages
RTMPPacket_Free(&packet);
}
}
After this I am stuck at how to use RTMP_Write() to publish a file to Wowza media server?
In my own experience, streaming video data to an RTMP server is actually pretty simple on the librtmp side. The tricky part is to correctly packetize video/audio data and read it at the correct rate.
Assuming you are using FLV video files, as long as you can correctly isolate each tag in the file and send each one using one RTMP_Write call, you don't even need to handle incoming packets.
The tricky part is to understand how FLV files are made.
The official specification is available here: http://www.adobe.com/devnet/f4v.html
First, there's a header, that is made of 9 bytes. This header must not be sent to the server, but only read through in order to make sure the file is really FLV.
Then there is a stream of tags. Each tag has a 11 bytes header that contains the tag type (video/audio/metadata), the body length, and the tag's timestamp, among other things.
The tag header can be described using this structure:
typedef struct __flv_tag {
uint8 type;
uint24_be body_length; /* in bytes, total tag size minus 11 */
uint24_be timestamp; /* milli-seconds */
uint8 timestamp_extended; /* timestamp extension */
uint24_be stream_id; /* reserved, must be "\0\0\0" */
/* body comes next */
} flv_tag;
The body length and timestamp are presented as 24-bit big endian integers, with a supplementary byte to extend the timestamp to 32 bits if necessary (that's approximatively around the 4 hours mark).
Once you have read the tag header, you can read the body itself as you now know its length (body_length).
After that there is a 32-bit big endian integer value that contains the complete length of the tag (11 bytes + body_length).
You must write the tag header + body + previous tag size in one RTMP_Write call (else it won't play).
Also, be careful to send packets at the nominal frame rate of the video, else playback will suffer greatly.
I have written a complete FLV file demuxer as part of my GPL project FLVmeta that you can use as reference.
In fact, RTMP_Write() seems to require that you already have the RTMP packet formed in buf.
RTMPPacket *pkt = &r->m_write;
...
pkt->m_packetType = *buf++;
So, you cannot just push the flv data there - you need to separate it to packets first.
There is a nice function, RTMP_ReadPacket(), but it reads from the network socket.
I have the same problem as you, hope to have a solution soon.
Edit:
There are certain bugs in RTMP_Write(). I've made a patch and now it works. I'm going to publish that.