How can I (or is it possible to) convert the AVC codec profile and level to the MIME codec definition? - html5-video

In my use-case I have to provide codec specification within the HTML5 video source's MIME type. But even a type="video/mp4; codecs=avc1" is not detailed enough for Firefox. Firefox needs the extra detail of for example type="video/mp4; codecs=avc1.64001E". My problem is that I don't know where to get this 64001E part from.
The whole identification happens on server side. So far I was using ffprobe and that's perfectly supplies me JSON format output, like so:
ffprobe -select_streams v:0 -v info -of json -show_entries stream=codec_name,level,profile,width,height -i 1CE89B23-F9BD-43B9-805B-C49ACA9E5FFB_xxxxxxx.mp4
"streams": [
{
"codec_name": "h264",
"profile": "High",
"width": 1080,
"height": 1920,
"level": 50
}
]
}
I can get the profile and the level, but nothing like 64001E. In my local environment I also have mediainfo:
mediainfo 8038B652-106B-4FBB-BAD6-AF7E32913FDE_xxxxxxx.mp4
General
Complete name : 8038B652-106B-4FBB-BAD6-AF7E32913FDE_xxxxxxx.mp4
Format : MPEG-4
Format profile : Base Media
Codec ID : isom (isom/iso2/avc1/mp41)
File size : 1.18 MiB
Duration : 6 s 634 ms
Overall bit rate : 1 496 kb/s
Writing application : Lavf57.83.100
Video
ID : 1
Format : AVC
Format/Info : Advanced Video Codec
Format profile : High#L3
Format settings : CABAC / 5 Ref Frames
Format settings, CABAC : Yes
Format settings, Reference frames : 5 frames
Codec ID : avc1
Codec ID/Info : Advanced Video Coding
Duration : 6 s 634 ms
Bit rate : 1 396 kb/s
Width : 360 pixels
Height : 480 pixels
Display aspect ratio : 0.750
Frame rate mode : Constant
Frame rate : 30.000 FPS
Color space : YUV
Chroma subsampling : 4:2:0
Bit depth : 8 bits
Scan type : Progressive
Bits/(Pixel*Frame) : 0.269
Stream size : 1.10 MiB (93%)
Writing library : x264 core 152 r2854 e9a5903
Encoding settings : cabac=1 / ref=5 / deblock=1:0:0 / analyse=0x3:0x113 / me=hex / subme=8 / psy=1 / psy_rd=1.00:0.00 / mixed_ref=1 / me_range=16 / chroma_me=1 / trellis=2 / 8x8dct=1 / cqm=0 / deadzone=21,11 / fast_pskip=1 / chroma_qp_offset=-2 / threads=15 / lookahead_threads=2 / sliced_threads=0 / nr=0 / decimate=1 / interlaced=0 / bluray_compat=0 / constrained_intra=0 / bframes=3 / b_pyramid=2 / b_adapt=1 / b_bias=0 / direct=3 / weightb=1 / open_gop=0 / weightp=2 / keyint=250 / keyint_min=25 / scenecut=40 / intra_refresh=0 / rc_lookahead=50 / rc=crf / mbtree=1 / crf=17.0 / qcomp=0.60 / qpmin=0 / qpmax=69 / qpstep=4 / ip_ratio=1.40 / aq=1:1.00
Codec configuration box : avcC
Audio
ID : 2
Format : AAC LC
Format/Info : Advanced Audio Codec Low Complexity
Codec ID : mp4a-40-2
Duration : 6 s 632 ms
Duration_LastFrame : -9 ms
Bit rate mode : Constant
Bit rate : 90.4 kb/s
Channel(s) : 1 channel
Channel layout : C
Sampling rate : 44.1 kHz
Frame rate : 43.066 FPS (1024 SPF)
Compression mode : Lossy
Stream size : 73.2 KiB (6%)
Default : Yes
Alternate group : 1
What we see here is that the AAC part has a longer Codec ID mp4a-40-2, but the video stream is still just avc1.
I'm looking at lists https://tools.woolyss.com/html5-canplaytype-tester/ and https://wiki.whatwg.org/wiki/Video_type_parameters and I think maybe there's a programmatic way to convert the codec profile + level to the code what the MIME type codec specification has.
In https://developer.mozilla.org/en-US/docs/Web/Media/Formats/codecs_parameter I see that "avc1.4d002a" means Main Profile, Level 4.2. Looking at the list I linked earlier I figured that the 6 hex digits can be broken down into groups of two. The last two is the level. In this latest example the Level is 4.2, we just have to remove the dot => it becomes 42, which is 2a hex. The other 4 hex digits are related to the profile as Main, High, etc and then Progressive, but I haven't found a definition yet, and I wonder if the ffprobe is able to output things like High 4:2:2 Intra Level or High Progressive Level. We'll see.
https://www.rfc-editor.org/rfc/rfc6381#page-12 has some examples, but I followed the links and still don't see any definitive list or anything.
The ITU-T H.264 specification Annex A lists 14 profiles. In those listings there's a profile_idc mentioned, which seems to be the decimal for the first two hex digits, for example High's profile_idc is 100 decimal, which is 64 hexa. Now we just need to figure out the middle two hexa digits. Preferably a GitHub repo source file would be great where these things are curated into a sane concise const literal array.

It is described in e.g. Mozilla link ("PPCCLL is six hexadecimal digits specifying the profile number (PP), constraint set flags (CC), and level (LL)"). If you don't find a tool fitting your needs, we could extend e.g. MediaInfo for that, let us know.
Note: the CC indicated in the list are the expected flags, not the ones really in the file, it should be OK 99.99% of the time but you can not be sure it is the real content. MediaInfo internally reads the flags but it does not export them for the moment.

Related

How do I feed CD audio tracks into an ALSA-driven sound output device?

I'm using a USB CD/DVD drive without built-in sound decoder and controlling it via ALSA, which already works. The host is a Raspberry Pi 3B with the current Raspbian. Here is the corresponding config file:
pi#autoradio:/etc $ cat asound.conf
pcm.dmixer {
type dmix
ipc_key 1024
ipc_perm 0666
slave {
pcm "hw:0,0"
period_time 0
period_size 1024
buffer_size 4096
rate 192000
format S32_LE
channels 2
}
bindings {
0 0
1 1
}
}
pcm.dsnooper {
type dsnoop
ipc_key 2048
ipc_perm 0666
slave
{
pcm "hw:0,0"
period_time 0
period_size 1024
buffer_size 4096
rate 192000
format S32_LE
channels 2
}
bindings {
0 0
1 1
}
}
pcm.duplex {
type asym
playback.pcm "dmixer"
capture.pcm "dsnooper"
}
pcm.!default {
type plug
slave.pcm "duplex"
}
ctl.!default {
type hw
card 0
}
To read the music from CD-DA, I'm gonna use the CDIO++ library. Its cd-info utility recognises both the drive, and the audio CD:
pi#autoradio:/etc $ cd-info
cd-info version 2.1.0 armv7l-unknown-linux-gnueabihf
CD location : /dev/cdrom
CD driver name: GNU/Linux
access mode: IOCTL
Vendor : MATSHITA
Model : CD-RW CW-8124
Revision : DA0D
Hardware : CD-ROM or DVD
Can eject : Yes
Can close tray : Yes
Can disable manual eject : Yes
Can select juke-box disc : No
Can set drive speed : No
Can read multiple sessions (e.g. PhotoCD) : Yes
Can hard reset device : Yes
Reading....
Can read Mode 2 Form 1 : Yes
Can read Mode 2 Form 2 : Yes
Can read (S)VCD (i.e. Mode 2 Form 1/2) : Yes
Can read C2 Errors : Yes
Can read IRSC : Yes
Can read Media Channel Number (or UPC) : Yes
Can play audio : Yes
Can read CD-DA : Yes
Can read CD-R : Yes
Can read CD-RW : Yes
Can read DVD-ROM : Yes
Writing....
Can write CD-RW : Yes
Can write DVD-R : No
Can write DVD-RAM : No
Can write DVD-RW : No
Can write DVD+RW : No
__________________________________
Disc mode is listed as: CD-DA
I've already got some code to send the PCM data to the sound card and some insight regarding the (rather poorly documented) CDIO API (I know that the readSectors() method is used for reading sound data from the CD sector after sector), but not really a clue on how to hand over the data from the CD-DA input to the ALSA output routine correctly.
Please nopte that mplayer is off-limits to me as this routine will be a part of a larger solution.
Any help would be greatly appreciated.
UPDATE: Does the different block size of an audio CD (2,352 bytes) and of the sound output (910 bytes, at least in my particular case) matter?
CD audio data is just two channels of little-endian 16-bit samples at 44.1 kHz.
If you output the data to the standard output, you can pipe it into your sound-playing program, or aplay:
./my-read-cdda | ./play 44100 2 99999
./my-read-cdda | aplay --file-type raw --format cd
If you want to do everything in a single program, replace the read(0, ...) with readSectors(). (The buffer size does not need to have any relation with ALSA's period size or buffer size.)

required size of a configuration file for a HX1K (in "SPI slave" mode)

I am reworking the programmer for the Olimex iCE40HX1K board (targetted towards a STM32F103 ma) where I also would like to implement the "SPI Slave" mode to configure an image directly into RAM without using the serial flash.
Looking at the Lattice "programming and Configuration guide" (page 11), it is noted in table 8 that a EPROM for a ICE40-LP/LX1K must be at least 34112 bytes. (which -I guess- means that the configuration-files can be up to that size).
However, all images I have (sofar) created with the icestorm tools are 32220 octets.
I am a bit puzzled here.
Can somebody explain the difference between these two figures?
Does the HX1K need a configuration-file of 32220 or 34112 bytes?
I don't know how Lattice arrived at this number. A complete HX1K bin file with BRAM initialization but without comment and without multiboot header is 32220 bytes in size. The (optional) multiboot header would add another 160 bytes (32220 + 160 = 32380). The lattice tools usually add about 80 bytes to the comment field (32220 + 80 = 32300). Whatever I do, all numbers I have are more than 1000 short of 34112.
I don't know if there is a maximum length for the comment. Maybe there is and 34112 is the size of a bit stream with a comment of maximum length?
34112 - 32220 = 1892. Maybe someone decided to add 8kB (8192 bytes) just in case, but that person accidentally swapped the first two digits? Idk..
If you don't care about comments or multiboot headers, then iCE40 1K bit-streams have a fixed size, and that size is 32220 bytes.

How to transmit data using GFSK modulation?

I am new to this domain.
I want to transmit data using GFSK modulation using GNU Radio which response to the following specifications :
Deviation : +/-2,4 kHz +0,2%
Modulation index : 2
Filter index : 0.5 BT
Bitrate : 2400 bit/s
I want to transmit this data (data is in HEXA):
Preamble : 55 55
Synchro : F6 72
L-field ( data + CRC) : 20
Please start with the gr-digital (part of GNU Radio dist) GFSK Mod block. Take a peek at the source python, and then you'll want to look at the gr-analog frequency_modulation.h source for details about FM, specifically frequency deviation. For some reason, the doxygen output does not generate the embedded formula, which is why looking at the header file is suggested.
Then I suggest you do some simple experiments in GNU Radio Companion to better understand the GFSK Mod block.
Hope this helps.
I have based on this example of FSK transmitter, https://nccgroup.github.io/RFTM/fsk_transmitter.html
so you need juste to put the value of BT equal to 0.5 to make a GFSK transmitter. And everthing works correctely.

HLS MPEG Transport Stream Index File

I am currently looking to add Trick and Play functionality to HTTP Live Streaming (HLS) Server. For Trick and Play functionality to work, generally MPEG Transport Streams are pre-indexed. What is the general format of the Transport Stream Index files and how can I determine the I-frame in Transport Stream using the Index files?
I am using Transport Stream and Index file from here.
Each live555 TS index record is 11 bytes long:
- Record Type: 1 byte
- Start Offset: 1 byte
- Size: 1 byte
- PCR (integer part): 3 bytes (little-endian)
- PCR (fractional part): 1 byte
- Transport Packet Number: 4 bytes (little-endian)
Your sample is H.264 so the Record Type to look for is:
RECORD_NAL_H264_IFRAME = 9, // H.264
Source

How do I get all the information regarding the header of an audio file?

How do I get all the information regarding the header of an audio file, so it can be displayed in a readable format like ASCII values?
The audio file maybe of any format, most preferably .wav format.
EDIT:- OS can be windows 8.1 or ubuntu. I actually have to understand all the properties of the file like whether it is mono or stereo, its encoding, etc. maybe specifically .wav file, i would say.
I have knowledge about the C++ language, so that would be better.
There is a very powerful command you can use in a bash script: sox.
To get all the info you need about a wav file, you just have to run:
soxi file.wav
and you'll get something like:
Input File : 'file.wav'
Channels : 1
Sample Rate : 8000
Precision : 16-bit
Duration : 00:02:08.40 = 1027236 samples ~ 9630.34 CDDA sectors
File Size : 2.05M
Bit Rate : 128k
Sample Encoding: 16-bit Signed Integer PCM
sox is available for Windows as well, although I have never used there.
You can use FFPROBE utility from FFMPEG: http://ffmpeg.org/
"ffprobe" gathers information from multimedia streams and prints it in human- and machine-readable fashion.
For example it can be used to check the format of the container used by a multimedia stream
and the format and type of each media stream contained in it.
Code
ffprobe -i <file_name>
ffprobe -i myfile.wav
Output will look something like this
Input #0, wav, from 'myfile.wav':
Duration: 00:04:16.88, bitrate: 16 kb/s
Stream #0:0: Audio: g729 ([131][0][0][0] / 0x0083), 8000 Hz, 2 channels, s16p, 16 kb/s
Output Explanation:
g729 is encoding type here.
16 kb/s is the bit-rate.
2 channels
Sample rate 8000 Hz
For detailed information
ffprobe -i <file_name> -show_streams