What does packing nal packets mean? - objective-c

I have been trying to use the information from this question to solve a similar problem.
However, from the answer; I am not sure what is meant by the following:
I was not packing the raw NAL data correctly (not sure where this is
documented, if anywhere).
or even the solution to this packing issue.
To solve #2, through trial and error I found that giving my NAL units
in the following form worked:
[7 8 5] [1] [1] [1]..... [7 8 5] [1] [1] [1]..... (repeating)
Where each NAL unit is prefixed by a 32-bit start code equaling
0x00000001.
I have seen similar expression concerning nal packets. The original post from the link above has a statement that says:
My stream of NALs contains only SPS/PPS/IDR/P NALs (1, 5, 7, 8)
Again, what does this mean? How would I pack raw NAL data correctly in objective-c?
Any help would be greatly appreciated.

It is called H.264 Annex B byte stream format (defined in ISO/IEC 14496-12).
http://wiki.multimedia.cx/?title=H.264
I think, below page has very good explanation:
http://gentlelogic.blogspot.kr/2011/11/exploring-h264-part-2-h264-bitstream.html
There are many open source implementations about this (not easily reusable though).
NAL AU is the unit of data to encapsulate an encoded frame data. It is consist of header (start code), type, length, and body part.
SPS/PPS/IDR/P are frame types.
SPS : config information about overall stream (encoding method, parameter, etc)
PPS : config information about pictures (width, height, etc)
IDR : special frame/packet to setup decoder
P : usual encoded frame data
Order of frame in ordinary movie files: SPS (once) PPS (once) IDR (periodically) P (actual picture) P P P P P IDR P P P P P P P ...
For annex B byte stream processing, Intel IPP code samples are very good reference (umc_h264_nal_spl.cpp).
Its currently free to download (latest version is free to evaluate 30 days).
https://software.intel.com/en-us/articles/code-samples-for-intel-integrated-performance-primitives-intel-ipp-library-71
Annex B byte stream format describes how to store H.264 encoded frame data in a media container (such as mp4, MPEG 2 TS). Handling the container format binary data also requires many hard works. Each container uses different mechanism to specify codec configuration. As mentioned related SO post (AVAssetWriterInput H.264 Passthrough to QuickTime (.mov) - Passing in SPS/PPS to create avcC atom?), mp4/mov uses avcc box format which is defined in other ISO/IEC document.

Related

Erlang binary protocol serialization

I'm currently using Erlang for a big project but i have a question regarding a proper proceeding.
I receive bytes over a tcp socket. The bytes are according to a fixed protocol, the sender is a pyton client. The python client uses class inheritance to create bytes from the objects.
Now i would like to (in Erlang) take the bytes and convert these to their equivelant messages, they all have a common message header.
How can i do this as generic as possible in Erlang?
Kind Regards,
Me
Pattern matching/binary header consumption using Erlang's binary syntax. But you will need to know either exactly what bytes or bits your are expecting to receive, or the field sizes in bytes or bits.
For example, let's say that you are expecting a string of bytes that will either begin with the equivalent of the ASCII strings "PUSH" or "PULL", followed by some other data you will place somewhere. You can create a function head that matches those, and captures the rest to pass on to a function that does "push()" or "pull()" based on the byte header:
operation_type(<<"PUSH", Rest/binary>>) -> push(Rest);
operation_type(<<"PULL", Rest/binary>>) -> pull(Rest).
The bytes after the first four will now be in Rest, leaving you free to interpret whatever subsequent headers or data remain in turn. You could also match on the whole binary:
operation_type(Bin = <<"PUSH", _/binary>>) -> push(Bin);
operation_type(Bin = <<"PULL", _/binary>>) -> pull(Bin).
In this case the "_" variable works like it always does -- you're just checking for the lead, essentially peeking the buffer and passing the whole thing on based on the initial contents.
You could also skip around in it. Say you knew you were going to receive a binary with 4 bytes of fluff at the front, 6 bytes of type data, and then the rest you want to pass on:
filter_thingy(<<_:4/binary, Type:6/binary, Rest/binary>>) ->
% Do stuff with Rest based on Type...
It becomes very natural to split binaries in function headers (whether the data equates to character strings or not), letting the "Rest" fall through to appropriate functions as you go along. If you are receiving Python pickle data or something similar, you would want to write the parsing routine in a recursive way, so that the conclusion of each data type returns you to the top to determine the next type, with an accumulated tree that represents the data read so far.
I only covered 8-bit bytes above, but there is also a pure bitstring syntax, which lets you go as far into the weeds with bits and bytes as you need with the same ease of syntax. Matching is a real lifesaver here.
Hopefully this informed more than confused. Binary syntax in Erlang makes this the most pleasant binary parsing environment in a general programming language I've yet encountered.
http://www.erlang.org/doc/programming_examples/bit_syntax.html

Is this GPGSV sentence valid?

While parsing the NMEA output of a GPS receiver I get the following lines:
$GPGSV,4,1,16,02,17,228,35,03,04,048,37,05,59,285,29,06,02,030,34*73
$GPGSV,4,2,16,07,58,061,46,08,80,159,40,09,11,227,32,10,51,167,47*77
$GPGSV,4,3,16,13,15,089,38,15,00,279,,16,00,018,,26,34,279,42*7A
$GPGSV,4,4,16,28,20,154,39*4C
As I understand it, from various sources on the web (e.g. here), this is wrong. According to the 3rd number, there should be 16 satellites, which was true for all those GPS receivers I previously encountered, but the sentence from this one only contains the data for 13 satellites.
Is this an error? Or do I read the specification wrongly?
Nmea is a weakly specified file format. GPS chip manufactures provide documenttaion how they interpret the NMEA specification.
For example ublox and Sirf each have a chapter of about 40 pages describing how to interpret the NMEA format.
So if you write " Or do I read the specification wrongly?", then the question is which specification you are reading. That of the GPS chip manufacturer? The NMEA 0183 spec does not contain enough info to correctly parse the sentences.
Especially in your case: the NMEA protocol does not desribe how to handle empty values vs invalid ones.
In your case the receiver theretically expects to see 16 satellites, but found only 13.
I would expect that the missing 3 sats would have empty ",,,,,,,,". But obviously the manufacturer decided to just stop and append the checksum string. (Its simply not speciefied that it is mandatory to print out empty semicolons for the missing 3 sats.
Unfortunaetly you have to expect to write a NMEA parser for each CHPS chip manufacturer.
Therfore I always recommend to use the binary format of the Chip manufactureres protocol. (e,.g uBlox bianry or Sirf binary because these are exactly specified).
You can further look at the docu for GpsBable: they show how different manufacturres produce different GSV data sets.
Update:
As you now told that it is a ublox receiver:
The answer is, yes the NMEA sentences are valid. Look at the ublox protocol spec. i use spec for ublox 5:
On page where the GSV sentence is described look at the "Message Structure":
{,sv,elv,az,cno}*cs
the curly braces enclose the sequence that is repeated.
And below look at "1..4": this means 1,2,3 or 4 blocks. There is not written "4", its "1..4" therefore satelite info is optional, and has not to be empty.
If you further look at the example ublox gives, then you see, that the last GPGSV message contains less than 4 satellites, exactly as you are showing in your question.
Yes, it's inconsistent; the last message should have described more than one satellite (four, actually) so as to total the 16 advertised. The GPS receiver should have reported at least the satellite IDs (PRN), even if their viewing direction in the sky and SNR were unknown at the time, e.g.: {,01,,,}.
That being said, it's better to write programs tolerant against ill-formed messages; in this case, updating the number of satellites in view to 13, as counted.
(I've checked the checksums and they're okay.)

Publishing a stream using librtmp in C/C++

How to publish a stream using librtmp library?
I read the librtmp man page and for publishing , RTMP_Write() is used.
I am doing like this.
//Code
//Init RTMP code
RTMP *r;
char uri[]="rtmp://localhost:1935/live/desktop";
r= RTMP_Alloc();
RTMP_Init(r);
RTMP_SetupURL(r, (char*)uri);
RTMP_EnableWrite(r);
RTMP_Connect(r, NULL);
RTMP_ConnectStream(r,0);
Then to respond to ping/other messages from server, I am using a thread to respond like following:
//Thread
While (ThreadIsRunning && RTMP_IsConnected(r) && RTMP_ReadPacket(r, &packet))
{
if (RTMPPacket_IsReady(&packet))
{
if (!packet.m_nBodySize)
continue;
RTMP_ClientPacket(r, &packet); //This takes care of handling ping/other messages
RTMPPacket_Free(&packet);
}
}
After this I am stuck at how to use RTMP_Write() to publish a file to Wowza media server?
In my own experience, streaming video data to an RTMP server is actually pretty simple on the librtmp side. The tricky part is to correctly packetize video/audio data and read it at the correct rate.
Assuming you are using FLV video files, as long as you can correctly isolate each tag in the file and send each one using one RTMP_Write call, you don't even need to handle incoming packets.
The tricky part is to understand how FLV files are made.
The official specification is available here: http://www.adobe.com/devnet/f4v.html
First, there's a header, that is made of 9 bytes. This header must not be sent to the server, but only read through in order to make sure the file is really FLV.
Then there is a stream of tags. Each tag has a 11 bytes header that contains the tag type (video/audio/metadata), the body length, and the tag's timestamp, among other things.
The tag header can be described using this structure:
typedef struct __flv_tag {
uint8 type;
uint24_be body_length; /* in bytes, total tag size minus 11 */
uint24_be timestamp; /* milli-seconds */
uint8 timestamp_extended; /* timestamp extension */
uint24_be stream_id; /* reserved, must be "\0\0\0" */
/* body comes next */
} flv_tag;
The body length and timestamp are presented as 24-bit big endian integers, with a supplementary byte to extend the timestamp to 32 bits if necessary (that's approximatively around the 4 hours mark).
Once you have read the tag header, you can read the body itself as you now know its length (body_length).
After that there is a 32-bit big endian integer value that contains the complete length of the tag (11 bytes + body_length).
You must write the tag header + body + previous tag size in one RTMP_Write call (else it won't play).
Also, be careful to send packets at the nominal frame rate of the video, else playback will suffer greatly.
I have written a complete FLV file demuxer as part of my GPL project FLVmeta that you can use as reference.
In fact, RTMP_Write() seems to require that you already have the RTMP packet formed in buf.
RTMPPacket *pkt = &r->m_write;
...
pkt->m_packetType = *buf++;
So, you cannot just push the flv data there - you need to separate it to packets first.
There is a nice function, RTMP_ReadPacket(), but it reads from the network socket.
I have the same problem as you, hope to have a solution soon.
Edit:
There are certain bugs in RTMP_Write(). I've made a patch and now it works. I'm going to publish that.

Reading SWF Header with Objective-C

I am trying to read the header of an SWF file using NSData.
According to SWF format specification I need to access movie's width and height reading bits, not bytes, and I couldn't find a way to do it in Obj-C
Bytes 9 thru ?: Here is stored a RECT (bounds of movie). It must be read in binary form. First of all, we will transform the first byte to binary: "01100000"
The first 5 bits will tell us the size in bits of each stored value: "01100" = 12
So, we have 4 fields of 12 bits = 48 bits
48 bits + 5 bits (header of RECT) = 53 bits
Fill to complete bytes with zeroes, till we reach a multiple of 8. 53 bits + 3 alignment bits = 56 bits (this RECT is 7 bytes length, 7 * 8 = 56)
I use this formula to determine all this stuff:
Where do I start?
ObjC is a superset of C: You can run C code alongside ObjC with no issues.
Thus, you could use a C-based library like libming to read bytes from your SWF file.
If you need to shuffle bytes into an NSData object, look into the -dataWithBytes:length: method.
Start by looking for code with a compatible license that already does what you want. C libraries can be used from Obj-C code simply by linking them in (or arranging for them to be dynamically linked in) and then calling their functions.
Failing that, start by looking at the Binary Data Programming Guide for Cocoa and NSData Class Reference. You'd want to pull out the bytes that contain the bits you're interested in, then use bit masking techniques to extract the bits you care about. You might find the BitTst(), BitSet(), and BitClr() functions and their friends useful, if they're still there in Snow Leopard; I'm not sure whether they ended up in the démodé parts of Carbon or not. There are also the Posix setbit(), clrbit(), isset(), and isclr() macros defined in . Then, finally, there are the C bitwise operators: ^, |, &, ~, <<, and >>.

AVSampleBufferDisplayLayer renders half of each frame when using high preset

I am using AVSampleBufferDisplayLayer to display video that is being streamed over the network. On the sending side an AVCaptureSession is used to capture CMSampleBuffers which are serialized into NAL units and streamed to the receiver, which then turns them back into CMSampelBuffers and feeds them to AVSampleBufferDisplayLayer (as is described for instance here). It works quite well - I can see the video and it streams more or less smoothly.
If I set the capture session's sessionPreset to AVCaptureSessionPresetHigh the video shown on the receiving side is cut in half - the top half displays the video from the sender while the bottom half is a solid dark green. If I use any other preset (e.g. AVCaptureSessionPresetMedium or AVCaptureSessionPreset1280x720) the video displays in its entirety.
Has anyone encountered such an issue, or has any idea what might cause it?
I tried examining the data at the source as well as the data at the destination, to see if I can determine where the image is being chopped off, but I have not been successful. It occurred to me that perhaps the high quality frame is being split into more than one NALUs and I am not putting it together correctly - is that possible? How does such splitting look like on the elementary-stream level (if possible at all)?
Thank you
Amos
The problem turned out to be that at the preset AVCaptureSessionPresetHigh one frame would get split into more than one type 5 (or type 1) NALU. On the receiving side I was combining the SPS, PPS and type 1 (or type 5) NAL units into a CMSampleBuffer, but ignoring the second part of the frame if they were split, which caused the problem.
In order to recognize if two successive NAL units belong to the same frame it is necessary to parse the slice header of the picture NAL units. This requires delving into the specification, but goes more or less like this: the first field of the slice header is first_mb_in_slice which is encoded in Golomb encoding. Next come slice_type and the pic_aprameter_set_id, also in Golomb encoding, and finally the frame_number, as an unsigned integer of length (log2_max_frame_num_minus_4 + 4) bits (to get the value of log2_max_frame_num_minus_4 it is necessary to parse the PPS corresponding to this frame). If two consecutive NAL units have the same frame_num they are part of the same frame and should be put into the same CMSampleBuffer.