How to extract frames from video using webcodecs from chrome 86 - html5-video

WebCodecs is released in Chrome 86. But there's no real code example on how to use it yet. Given a video url, how to extract video frames as ImageData using webcodecs?

What you describe is the entire complex process of acquiring raw bitmap-like data (e.g. something you can dump on a canvas), from a formatted file or a stream of data chunks.
In case of files (including the case where your URL points to a complete file, such as an .mp4 file), this is generally made of 2 steps:
Parsing the container file into individual chunks of encoded video and/or audio
Decoding these chunks of encoded video/audio
WebCodecs only facilitates step 2 of this process, i.e. what is called decoding. The reasoning behind this decision was that parsing the container is computationally trivial, so you can efficiently do this with the File APIs already, but you still need to implement parsing/processing the container yourself.
Luckily, plenty of libraries exist already, many of which ironically existed long before the emergence of the WebCodecs API.
MP4Box is one example, helping you acquire encoded video and audio chunks, which you can then feed into a VideoDecoder or AudioDecoder.
With MP4Box, the key piece of your code will be centered around the onSamples callback you provide, and it'll look something like this:
mp4BoxFile.onSamples = (trackId, user, chunks) =>
{
for (let i = 0; i < chunks.length; i++)
{
let chunk = chunks[i];
let encodedChunk = new EncodedVideoChunk({
// you'll need to deep-inspect chunk to figure these out
type: "key", // or "delta"
timestamp: ...
duration: ...
data: chunk.data
});
// pass encodedChunk to a VideoDecoder instance's decode method
}
};
This is just a rough sketch of how your code will probably look, it probably won't work without more inspection, and it'll take a lot of trial and error, because this is very low level stuff.
WebCodecs is not the silver bullet you probably expected, but it can help you build one.

Related

Read binary files without having them buffered in the volume block cache

Older, now deprecated, macOS file system APIs provided flags to read a file unbuffered.
I seek a modern way to accomplish the same, so that I can read a file's data into memory without it being cached needlessly somewhere else in memory (such as the volume cache).
Reading with fread and first calling setvbuf (fp, NULL, _IONBF, 0) is not having the desired effect in my tests, for example. I am seeking other low-level functions that let me read into a prepared memory buffer and that let me avoid buffering of the whole data.
Background
I am writing a file search program. It reads large amounts of file content (many GBs) that isn't and won't be used by the user otherwise. It would be a waste to have all this data cached in the volume cache as it'll soon get purged by further reads again, anyway. It'll also likely lead to purging file data that's actually in use by the user or system, causing more cache misses.
Therefore, I should be able to tell the system that I do not need the file data cached. The little caching needed for cluster boundaries is not an issue. It's the many large chunks that I read briefly into memory to search it that is not needed to be cached.
Two suggestions:
Use the read() system call instead of stdio.
Disable data caching with the F_NOCACHE option for fcntl().
In Swift that would be something like (error checking omitted for brevity):
import Foundation
let path = "/path/to/file"
let fd = open(path, O_RDONLY)
fcntl(fd, F_NOCACHE, 1)
var buffer = Data(count: 1024 * 1024)
buffer.withUnsafeMutableBytes { ptr in
let amount = read(fd, ptr.baseAddress, ptr.count)
}
close(fd)

Reducing CPU usage of navigator.webkitGetUserMedia (Electron: DesktopCapturer)

I'm using navigator.webkitGetUserMedia to capture screenshot of a window once every second by assigning the returned stream to a <video> and copying it to a <canvas> and saving the Buffer to file.
The CPU usage in my application is consistently high and I've pinpointed it to this area.
Code
// Initialize the video, canvas, and ctx
var localStream,
_video = document.querySelector('#video'),
_canvas = document.querySelector('#canvas'),
_ctx = _canvas.getContext('2d'),
sourceName = 'my-window-id';
// Load the stream from navigator.webkitGetUserMedia
navigator.webkitGetUserMedia({
audio: false,
video: {
mandatory: {
chromeMediaSource: 'desktop',
chromeMediaSourceId: sourceName,
minWidth: 1920,
maxWidth: 1920,
minHeight: 1080,
maxHeight: 1080
}
}
}, gotStream, getUserMediaError);
function gotStream(stream) {
// Use the stream in our <video>
_video.src = window.URL.createObjectURL(stream);
// Reference the stream locally
localStream = stream;
}
function captureState() {
var buffer,
dataURL;
// Draw <video> to <canvas> and convert to buffer (image data)
_ctx.drawImage(_video, 0, 0);
dataURL = _canvas.toDataURL('image/png');
buffer = new Buffer(dataURL.split(",")[1], 'base64');
// Create an image from the data
fs.writeFileSync('screenshot.png', buffer);
}
// Capture state every second
setInterval(function() {
captureState();
}, 1000);
This code my not run, it's a simplified version of what I have in my code to make it StackOverflow readable.
Things I've Tried
_video.pause() and _video.play() when needed. Didn't seem to change CPU usage.
_video.stop(). This means I would have to get the stream again which causes a spike in CPU usage worse than keeping it open.
My best lead right now is to change the frame rate by adding:
optional: [
{ minFrameRate: 1 },
{ frameRate: 1 }
]
Extremely low frame rate would be fine. However, I haven't been able to determine if the frameRate setting works in this case. The docs don't have it listed and I don't have the newer mediaDevices.getUserMedia available.
Is it possible to set extremely low frame rates (or any at all) for navigator.webkitGetUserMedia?
Has anyone been able to reduce CPU usage of the stream in any other way?
Any alternative methods of achieving the same goal (state capture on interval) would also be helpful.
Thanks!
Side Note
This is in an Electron app on Windows using DesktopCapturer to get the chromeMediaSourceId.
Update on CPU Usage
Cost of running stream: 6% CPU Usage
Calling captureState every 1000ms: 5% CPU Usage
Total Current: 11%
Currently working on reducing #2 based on the recommendations of Csaba Toth so far. I should be able to reduce captureState by changing how the canvas is captured. Will update when that's done.
For #1, if I can't avoid capturing the video stream I'll have to just try to cap the total CPU usage at just over 6% by optimizing #2.
There's some unnecessary base64 encoding and operations going on here, it's weird how you get hold of the data:
dataURL = _canvas.toDataURL('image/png');
buffer = new Buffer(dataURL.split(",")[1], 'base64');
Take a look at how the QR decoder access the image instead: https://github.com/bulldogearthday/booths/blob/master/scripts/qrdecoder.js#L1991
var canvas_qr = document.getElementById("qr-canvas");
var context = canvas_qr.getContext('2d');
qrcode.width = canvas_qr.width;
qrcode.height = canvas_qr.height;
qrcode.imagedata = context.getImageData(0, 0, qrcode.width, qrcode.height);
(the other side of the software did a drawImage to the canvas earlier). Now the task would be to find a method which won't unnecessarily convert the PNG data into base64 and then decode it. I see that everywhere this URI encoding is advised because it's less number of lines. But performance wise an unecessary encoding/decoding phase is undesirable. 1920x1080 PNGs are big, not meant for base64 in-lining. Since you are in nodejs anyway, try to use https://github.com/niegowski/node-pngjs or similar library to save the image data.
There's always a tradeoff between space and time, so if time really matters with lower compression you can have higher performance: https://github.com/wheany/js-png-encoder
There is a trade-off here, since the base64 URI encoding examples take advantage of the browser's native (C++, fast) png encoding, but then do unnecessary base64 encodeing+decoding. The node-pngjs would perform PNG encoding in JS land, which maybe not as performant as the browser's internal encoding. The best would be to find a way to leverage the browser's encoding without having the base64.
Earlier advices
According to what you show I think your main problem is that you perform _ctx.drawImage(_video, 0, 0); and other operations in your gotStream.
Here is a Progressive Web App of mine, it performs QR code scanning too: https://github.com/bulldogearthday/booths/blob/master/scripts/app.js
Notice that in the "gotStream" (which is anonymous in my case https://github.com/bulldogearthday/booths/blob/master/scripts/app.js#L67) I only wire up the stream to the canvas.
My situation is easier because I don't have to enforce size (I hope you dont' hard wire those screen size pixel numbers), but I also perform processing (QR code scan attempt, every 500ms) periodically. I originally used timer for that, but that stopped working after a some iterations/ticks, so technically I issue a single timeout, and every time it hits I re-issue a new one. See initial timeout https://github.com/bulldogearthday/booths/blob/master/scripts/app.js#L209 and periodical re-issue: https://github.com/bulldogearthday/booths/blob/master/scripts/app.js#L231
As you can see the only place I do "heavy lifting" is in the app.scanQRCode which happens only twice a second. There I process the content of the canvas:
https://github.com/bulldogearthday/booths/blob/master/scripts/app.js#L218
I advise you to restructure your code that way. So setup either a timer ticking every second or re-issue time-outs as me. Then do the capture+save in that section. Hopefully that will lighten the CPU load, although encoding 1920x1080 PNG once a second may stress a CPU (there will be PNG encoding).
(That's beneficial if you want to go for individual images. If you want to end-up with a video anyway in the end, then I'd try to go on the route of enforcing 1s FPS video as you suggested and capturing the video stream directly instead of individual images. But for the CPU load my suggestion should help IMHO.)
In the README (https://github.com/bulldogearthday/booths) you can see one of the main sources I looked at for getUserMedia: https://github.com/samdutton/simpl/blob/gh-pages/getusermedia/sources/js/main.js
I don't fiddle with issuing .play() or .pause() or anything. As a matter of fact my code waits until it receives the signal that the play started (starts by itself by default at least for cameras): document.getElementById('qrVideo').addEventListener('playing', app.saveVideoSize, false); https://github.com/bulldogearthday/booths/blob/master/scripts/app.js#L67 My intention with that was to not disturb the natural process with anything if possible. In my case I detect the video size this gentle way. Looking at DesktopCapturer they also don't perform any extra in the gotStream in their README https://github.com/electron/electron/blob/master/docs/api/desktop-capturer.md, and as shown ideally you just wire up the video stream with the canvas.

How to retrieve and iterate over sound data chunks in iOS?

Given an audio file 'coolsound.aif', how might I approach the task of retrieving the sound data chunks (SSND chunks) and iterating over them to do some arbitrary processing? I hope to be able to achieve something like the following:
/*
* Pseudocode of what I'd like to do
*/
// get SSND chunks out of audio file somehow
Array soundDatachunks = getSSNDChunksFromSoundFile("coolsound.aif");
// iterate over each chunk
foreach(soundDataChunks as chunk){
// Now iterate over each element in the waveForm data array
foreach(chunk.waveForm as w){
//Just log it to debug console for now
Log(w);
}
}
Other info:
- My aim is to use the waveform data to visualize the audio file graphically.
- The audio file was recorded using AudioToolbox in this manner.
- SSND chunk has the structure as appears in this source:
typedef struct {
ID chunkID;
long chunkSize;
unsigned long offset;
unsigned long blockSize;
unsigned char WaveformData[];
} SoundDataChunk;
There are a few different API's that you can use and it all depends on how much control you want, and what you plan on doing with the audio data. The ExtendedAudioFile API is made for basic operations like getting the audio data and drawing it.
It may seem like a lot of code to make this happen, like you have to create a AudioStreamBasicDescription object and configure it just right, but it allows you to read an entire audio file very quickly and access all the samples for drawing.
GW

Publishing a stream using librtmp in C/C++

How to publish a stream using librtmp library?
I read the librtmp man page and for publishing , RTMP_Write() is used.
I am doing like this.
//Code
//Init RTMP code
RTMP *r;
char uri[]="rtmp://localhost:1935/live/desktop";
r= RTMP_Alloc();
RTMP_Init(r);
RTMP_SetupURL(r, (char*)uri);
RTMP_EnableWrite(r);
RTMP_Connect(r, NULL);
RTMP_ConnectStream(r,0);
Then to respond to ping/other messages from server, I am using a thread to respond like following:
//Thread
While (ThreadIsRunning && RTMP_IsConnected(r) && RTMP_ReadPacket(r, &packet))
{
if (RTMPPacket_IsReady(&packet))
{
if (!packet.m_nBodySize)
continue;
RTMP_ClientPacket(r, &packet); //This takes care of handling ping/other messages
RTMPPacket_Free(&packet);
}
}
After this I am stuck at how to use RTMP_Write() to publish a file to Wowza media server?
In my own experience, streaming video data to an RTMP server is actually pretty simple on the librtmp side. The tricky part is to correctly packetize video/audio data and read it at the correct rate.
Assuming you are using FLV video files, as long as you can correctly isolate each tag in the file and send each one using one RTMP_Write call, you don't even need to handle incoming packets.
The tricky part is to understand how FLV files are made.
The official specification is available here: http://www.adobe.com/devnet/f4v.html
First, there's a header, that is made of 9 bytes. This header must not be sent to the server, but only read through in order to make sure the file is really FLV.
Then there is a stream of tags. Each tag has a 11 bytes header that contains the tag type (video/audio/metadata), the body length, and the tag's timestamp, among other things.
The tag header can be described using this structure:
typedef struct __flv_tag {
uint8 type;
uint24_be body_length; /* in bytes, total tag size minus 11 */
uint24_be timestamp; /* milli-seconds */
uint8 timestamp_extended; /* timestamp extension */
uint24_be stream_id; /* reserved, must be "\0\0\0" */
/* body comes next */
} flv_tag;
The body length and timestamp are presented as 24-bit big endian integers, with a supplementary byte to extend the timestamp to 32 bits if necessary (that's approximatively around the 4 hours mark).
Once you have read the tag header, you can read the body itself as you now know its length (body_length).
After that there is a 32-bit big endian integer value that contains the complete length of the tag (11 bytes + body_length).
You must write the tag header + body + previous tag size in one RTMP_Write call (else it won't play).
Also, be careful to send packets at the nominal frame rate of the video, else playback will suffer greatly.
I have written a complete FLV file demuxer as part of my GPL project FLVmeta that you can use as reference.
In fact, RTMP_Write() seems to require that you already have the RTMP packet formed in buf.
RTMPPacket *pkt = &r->m_write;
...
pkt->m_packetType = *buf++;
So, you cannot just push the flv data there - you need to separate it to packets first.
There is a nice function, RTMP_ReadPacket(), but it reads from the network socket.
I have the same problem as you, hope to have a solution soon.
Edit:
There are certain bugs in RTMP_Write(). I've made a patch and now it works. I'm going to publish that.

Multithreading a filestream in vb2005

I am trying to build a resource file for a website basically jamming all the images into a compressed file that is then unpacked on the output buffers to the client.
my question is in vb2005 can a filestream be multi threaded if you know the size of the converted file, ala like a bit torrent and work on pieces of the filestream ( the individual files in this case) and add them to the resource filestream when they are done instead of one at a time?
If you need something similar to the torrents way of writing to a file, this is how I would implement it:
Open a FileStream on Thread T1, and create a queue "monitor" for step 2
Create a queue that will be read from T1, but written by multiple network reader threads. (the queue data structure would look like this: (position where to write, size of data buffer, data buffer).
Fire up the threads
:)
Anyway, from your comments, your problem seems to be another one..
I have found something in, but I'm not sure if it works:
If you want to write data to a file,
two parallel methods are available,
WriteByte() and Write(). WriteByte()
writes a single byte to the stream:
byte NextByte = 100;
fs.WriteByte(NextByte);
Write(), on the other hand, writes out
an array of bytes. For instance, if
you initialized the ByteArray
mentioned before with some values, you
could use the following code to write
out the first nBytes of the array:
fs.Write(ByteArray, 0, nBytes);
Citation from:
Nagel, Christian, Bill Evjen, Jay
Glynn, Morgan Skinner, and Karli
Watson. "Chapter 24 - Manipulating
Files and the Registry". Professional
C# 2005 with .NET 3.0. Wrox Press. ©
2007. Books24x7. http://common.books24x7.com/book/id_20568/book.asp
(accessed July 22, 2009)
I'm not sure if you're asking if a System.IO.FileStream object can be read from or written to in a multi-threaded fashion. But the answer in both cases is no. This is not a supported scenario. You will need to add some form of locking to ensure serialized access to the resource.
The documentation calls out multi-threaded access to the object as an unsupported scenario
http://msdn.microsoft.com/en-us/library/system.io.filestream.aspx