How do I process video frames in HTML5 quickly? - html5-video

I am testing HTML5's video API. The plan is to have a video play with an effect, like making it black and white. I have and working together using a buffer. I take the current video frame and copy to the scratch buffer where I can process it. The problem is the rate at which it runs.
The Video API of HTML5 has the 'timeupdate' event. I tried using this to have the handler process frames, once for every frame, but it runs at a slower rate than the video.
Any ideas to speed up processing frames?

You can get much more frequent redraws by using requestAnimationFrame to determine when to update your canvas, rather than relying on timeupdate, which only updates every 200-250ms. It's definitely not enough for frame-accurate animation. requestAnimationFrame will update at most every 16ms (approx 60fps), but the browser will throttle it as necessary and sync with video buffer draw calls. It's pretty much exactly what you want for this sort of thing.
Even with higher frame rates, processing video frames with a 2D canvas is going to be pretty slow. For one thing, you're processing every pixel sequentially in the CPU, running Javascript. The other problem is that you're copying around a lot of memory. There's no way to directly access pixels in a video element. Instead, you have to copy the whole frame into a canvas first. Then, you have to call getImageData, which not only copies the whole frame a second time, but it also has to allocate the whole block of memory again, since it creates a new ImageData every time. Would be nice if you could copy into an existing buffer, but you can't.
It turns out you can do extremely fast image processing with WebGL. I've written a library called Seriously.js for exactly this purpose. Check out the wiki for a FAQ and tutorial. There's a Hue/Saturation plugin you can use - just drop the saturation to -1 to get your video to grayscale.
The code will look something like this:
var composition = new Seriously();
var effect = composition.effect('hue-saturation');
var target = composition.target('#mycanvas');
effect.source = '#myvideo';
effect.saturation = -1;
target.source = effect;
composition.go();
The big down side of using WebGL is that not every browser or computer will support it - Internet Explorer is out, as is any machine with old or weird video drivers. Most mobile browsers don't support it. You can get good stats on it here and here. But you can get very high frame rates on pretty large videos, even with much more complex effects.
(There is also a small issue with a browser bug that, oddly enough, shows up in both Chrome and Firefox. Your canvas will often be one frame behind the video, which is only an issue if the video is paused, and is most egregious if you're skipping around. The only workaround seems to be to keep forcing updates, even if your video is paused, which is less efficient. Please feel free to vote those tickets up so they get some attention.)

Related

In Vulkan, using FIFO, happens when you don't have an image ready to present when the vertical blank arrives?

I'm new to graphics, and I've been looking at Vulkan presentation modes. I was wondering: in a situation where we've only got 2 images in our swapchain (one that the screen's currently reading from and one that's free), what happens if we don't manage to finish drawing to the currently free image before the next vertical blank? Do we do the presentation and get weird tearing, or do skip the presentation and draw the same image again (I guess giving a "stuttering" effect)? Do we need to define what happens, or is it automatic?
As a side note, is this why people use longer swap chains? i.e. so that if you managed to draw out 2 images to your swap chain while the screen was displaying the last image but now you're running late, at least you can present the newer of the 2 images from before?
I'm not sure how much of this is specific to FIFO or mailbox mode: I guess with mailbox you'll already have used the newest image you've got, so you're stuck again?
[2-image swapchain][1]
[1]: https://i.stack.imgur.com/rxe51.png
Tearing never happens in regular FIFO (or mailbox) mode. When you present an image, this image will be used for all subsequent vblanks until a new image is presented. And since FIFO disallows tearing, in your case, the image will be fully displayed twice.
If you are using a 2-deep swapchain with FIFO, you have to produce each image on time in order to avoid stuttering. With longer swapchains and FIFO, you have more leeway to avoid visible stuttering. With longer swapchains and mailbox, you can get a similar effect, but there will be less visible latency when your application is running on-time.

Metal drawable musings... ugh

I have two issues in my Metal App.
My call to currentPassDescriptor is stalling. I have too many drawables, apparently.
I'm wholly confused on how to most performantly configure the multiple MTKViews I am using.
Issue (1)
I have a problem with currentPassDescriptor in my app. It is occasionally blocking (for 1.00s) which, according to the docs, is because there is no currentDrawable available.
Background: I have 4 HD 1920x1080 videos playing concurrently, tiled out onto a 3840x2160 second external display as a debugging configuration. The pixel buffers of these AVPlayer instances are captured by 4 independent CVDIsplayLink callbacks and, from within the callback, there is the draw call to its assigned MTKView. A total of 4 MTKViews are subviews tiled on a single NSWindow, and are configured for manual drawing.
I'm using CVDisplayLink callbacks manually. If I don't, then I get stutter when mousing up on the app’s menus, for example.
Within each draw call, I do a bit of kernel shader work then attempt to obtain the currentPassDescriptor. If successful, I do one pass of a fragment/vertex shader and then present the drawable. My code flow follows Apple’s sample code as well as published examples.
According to the Metal System Trace, most of draw calls take under 5ms. The GPU is about 20-25% utilized and there’s about 25% of the GPU memory free. I can also cause the main thread to usleep() for 1 second without any hiccups.
Without any user interaction, there’s about a 5% chance of the videos stalling out in the first minute. If there’s some UI work going then I see that as windowServer work in Instruments. I also note that AVFoundation seems to cache about 15 frames of video onto the GPU for each AVPlayer.
If the cadence of the draw calls is upset, there's about a 10% chance that things stall completely or some of the videos -- some will completely stall, some will stall with 1hz updates, some won't stall at all. There's also less chance of stalling when running Metal System Trace. The movies that have stalled seem to have done so on obtaining a currentPassDescriptor.
This is really a poor design to have this currentPassDescriptor block for ≈1s during a render loop. So much so that I’m thinking of eschewing the MTKView all together and just drawing to a CAMetalLayer myself. But the docs on CAMetalLayer seem to indicate the same blocking behaviour will occur.
I also grab these 4 pixel buffers on the fly and render sub-size regions-of-interest to 4 smaller MTKViews on the main monitor; but the stutters still occur if this code is removed.
Is the drawable buffer limit per MTKView or per the backing CALayer? The docs for maximumDrawableCount on CAMetalLayer say the number needs to be 2 or 3. This question ties into the configuration of the views.
Issue (2)
My current setup is a 3840x2160 NSWindow with a single content view. This subclass of NSView does some hiding/revealing of the mouse cursor by introducing an NSTrackingRectTag. The MTKViews are tiled subviews on this content view.
Is this the best configuration? Namely, one NSWindow with tiled MTKViews… or should I do one MTKView per window?
I'm also not sure how to best configure these windows/layers — ie. by setting (or clearing) wantsLayer, wantsUpdateLayer, and/or canDrawSubviewsIntoLayer. I'm currently just setting wantsLayer to YES on the single content view. Any hints on this would be great.
Does adjusting these properties collapse all the available drawables to the backing layer only; are there still 2 or 3 per MTKView?
NB: I've attached a sample run of my Metal app. The longest 'work' on the top graph is just under 5ms. The clumps of green/blue are rendering on the 4 MTKViews. The 'work' alternates a bit because one of the videos is a 60fps source; the others are all 30fps.

Rendering video on HTML5 CANVAS takes huge amount of CPU

I am using HTML5 Canvas for rendering video, but the rendering is taking huge amount of CPU? I am using GtkLauncher (with webkit 1.8.0) for rendering the video on the Canvas.
Can some one please throw some light on this? Is video rendering on Canvas not efficient for embedded systems?
Also I would like to know, whether there is a way in HTML5 video tag to know the video frame rate, before I actually start to render the data on the Canvas. This I would need to know because I would have to set the timer (used for drawing the video frames) at that same frame rate .
Thanks and Regards,
Souvik
Most likely the video rendering is not accelerated and needs to
Decode on software
Resize on software
You did not give system details so this is just a guess. By poking browser internals you can dig out the truth.
Video framerate cannot be known beforehand and in theory can vary within one source. However if you host file yourself you can pre-extract this information using tools like ffmpeg and transfer the number in side-band (e.g. using AJAX / JSON).

QTKit capture: what frame size to use?

I am writing a simple video messenger-like application, and therefore i need to get frames of some compromise size to be able to fit into the available bandwidth, and still to have the captured image not distorted.
To retrieve frames I am using QTCaptureVideoPreviewOutput class, and i am successfully getting frames in the didOutputVideoFrame callback. (i need raw frames - mostly because i am using a custom encoder, so i just would like to get "raw bitmaps").
The problem is that for these new iSight cameras i am getting literally huge frames.
Luckily, these classes for capturing raw frames (QTCaptureVideoPreviewOutput) provide method setPixelBufferAttributes that allows to specify what kind of frames would i like to get. If i am lucky enough to guess some frame size that camera supports, i can specify it and QTKit will switch the camera into this specified mode. If i am unlucky - i get a blurred image (because it was stretched/shrinked), and, most likely, non-proportional.
I have been searching trough lists.apple.com, and stackoverflow.com, the answer is "Apple currently does not provide functionality to retrieve camera's native frame sizes". Well, nothing i can do about that.
Maybe i should provide in settings the most common frame sizes, and the user has to try them to see what works for him? But what are these common frame sizes? Where could i get a list of the frame dimensions that UVC cameras generate usually?
For testing my application i am using a UVC compliant camera, but not an iSight. I assume not every user is using iSight either, and i am sure even between different models iSight cameras have different frame dimensions.
Or, maybe, i should switch the camera to the default mode, generate a few frames, see what sizes it generates, and at least i will have some proportions? This looks like a real hack, and doesn't seem to be natural. And the image is most likely going to be blurred again.
Could you please help me, how have your dealt with this issue? I am sure i am not the first one who is faced with it. What would be the approach you would choose?
Thank you,
James
You are right, iSight camera produces huge frames. However, I doubt you can switch the camera to a different mode by setting pixel buffer attributes. More likely you set the mode of processing the frames in the QTCaptureVideoPreviewOutput. Take a look at QTCaptureDecompressedVideoOutput if you have not done it yet.
We also use the sample buffer to get the frame size. So, I would not say it's a hack.
A more natural way would be to make your own Quicktime Component that implements your custom encoding algorithm. In this case Quicktime would be able to use inside QTCaptureMovieFileOutput during the capture session. It would be a proper, but also a hard way.

HTML5 Large canvas

I've noticed that when dynamically creating a large canvas (6400x6400) that quite alot of the time nothing will be drawn on it, and when setting the canvas to a small size it works 100% of the time, however as I don't know any better, I have no other choice than to try and get the large canvas working correctly.
thisObj.oMapCanvas = jQuery( document.createElement('canvas') ).attr('width', 6400).attr('height', 6400).css('border','1px solid green').prependTo( thisObj.oMapLayer ).get(0);
// getContext and then drawing stuff here...
The purpose of the canvas is to simply draw a line between two nodes (images), which are within a div container that can be dragged around (viewport I think people call them).
What I "think" may be happening is that on a canvas resize it emptys the canvas, and that is interfering with the context drawing, as like I said previously it works all the time when the canvas is alot smaller.
Has anyone experienced this before and/or know any possible solutions?
That is an enormous sized canvas. 6400 x 6400 x 4 bytes per pixel is 156 MB, and your implementation may need to allocate two or more buffers of that size, for double buffering, or need to allocate video memory of that size as well. It's going to take a while to allocate and clear all that memory, and you may not be guaranteed to succeed at such an allocation. Is there a reason you need such an enormous canvas? You could instead try sizing your canvas to be only as large as necessary to draw the line between those two divs, or you could try using SVG instead of a canvas.
Another possibility would be to try dividing your canvas up into large tiles, and only rendering those tiles that are actually visible on the screen. Google Maps does this with images, to only load images for the portion of the map that is currently visible (plus some extra one each side of the screen to make sure that when you scroll you won't need to wait for it to render), maintaining an illusion that there is an enormous canvas while really only rendering something a bit bigger than the window.
Most browsers that implement HTML5 are still in early beta - so it's quite likely they are still working the bugs out.
However, the resolution of the canvas you are trying to create is very high .. much higher than what most people's monitors can even display. Is there are reason you need it quite so large? Why not restrict the draggable area to something more in line with typical display resolutions?
I had the same problem! I was trtying to use a big canvas to connect some divs. Eventually I gave up and drew a line using javascript (I drew my line using little images as pixels- I did it with divs first, but in IE the divs came out too big).