QTKit capture: what frame size to use? - objective-c

I am writing a simple video messenger-like application, and therefore i need to get frames of some compromise size to be able to fit into the available bandwidth, and still to have the captured image not distorted.
To retrieve frames I am using QTCaptureVideoPreviewOutput class, and i am successfully getting frames in the didOutputVideoFrame callback. (i need raw frames - mostly because i am using a custom encoder, so i just would like to get "raw bitmaps").
The problem is that for these new iSight cameras i am getting literally huge frames.
Luckily, these classes for capturing raw frames (QTCaptureVideoPreviewOutput) provide method setPixelBufferAttributes that allows to specify what kind of frames would i like to get. If i am lucky enough to guess some frame size that camera supports, i can specify it and QTKit will switch the camera into this specified mode. If i am unlucky - i get a blurred image (because it was stretched/shrinked), and, most likely, non-proportional.
I have been searching trough lists.apple.com, and stackoverflow.com, the answer is "Apple currently does not provide functionality to retrieve camera's native frame sizes". Well, nothing i can do about that.
Maybe i should provide in settings the most common frame sizes, and the user has to try them to see what works for him? But what are these common frame sizes? Where could i get a list of the frame dimensions that UVC cameras generate usually?
For testing my application i am using a UVC compliant camera, but not an iSight. I assume not every user is using iSight either, and i am sure even between different models iSight cameras have different frame dimensions.
Or, maybe, i should switch the camera to the default mode, generate a few frames, see what sizes it generates, and at least i will have some proportions? This looks like a real hack, and doesn't seem to be natural. And the image is most likely going to be blurred again.
Could you please help me, how have your dealt with this issue? I am sure i am not the first one who is faced with it. What would be the approach you would choose?
Thank you,

You are right, iSight camera produces huge frames. However, I doubt you can switch the camera to a different mode by setting pixel buffer attributes. More likely you set the mode of processing the frames in the QTCaptureVideoPreviewOutput. Take a look at QTCaptureDecompressedVideoOutput if you have not done it yet.
We also use the sample buffer to get the frame size. So, I would not say it's a hack.
A more natural way would be to make your own Quicktime Component that implements your custom encoding algorithm. In this case Quicktime would be able to use inside QTCaptureMovieFileOutput during the capture session. It would be a proper, but also a hard way.


Expo camera real-time image analysis

I know how to take a picture with my camera using the “expo-camera” module, but I don’t know how to setup a system where it takes a photo about 10 times a second, analyses the colors in the image, to be used for tracking. Expo camera can return images as base64, so I’m guessing I would have to use that, but I don’t know to efficiently take a picture constantly and analyse it.
react-native-vision-camera might be more appropriate for what you're trying to achieve. It allows you to write frame processors to analyse frame contents.
But if you wanted to use expo-camera then you could do it the way you described, but you'd need to then find a module to take the Base64 encoded image and turn it into an array or stream of pixel values. This is likely to be very slow and use a lot of memory to run on the JS thread because each image from a standard 12MP camera is going to mean you'll be looping over an array of 12 million RGB values.

How to change aspect ratio of photos taken using AVFoundation?

I am using AVFoundation to take pictures instead of UIImagePicker due to how customizable the user interface presented to the user can be. When using it the aspect ratio that the picture is saved as is the same as the iPhone's video feed. What I want to happen is to have the pictures saved in the same aspect ratio as normal pictures are.
The way that I am currently approaching this is to overlay a black bar in the excess preview display and then just crop the photo after saving it as an image.
However, this feels very crude. I assume that it is a common thing to use the AVFoundation as a way of taking photos and so I assume I must be missing something!
I have used this example code. And I have read through the AVFoundation documentation but can only assume that I am missing a function. I have also read through similar questions to this which describe the process by which I might go about cropping images, but that isn't really my concern.
On the other hand, if there is no standard way to do this, please do let me know so that I can stop worrying that I am approaching it in a convoluted way.
Also, I am using Objective-C so if answers contain code, please could you use the same language?

How do I process video frames in HTML5 quickly?

I am testing HTML5's video API. The plan is to have a video play with an effect, like making it black and white. I have and working together using a buffer. I take the current video frame and copy to the scratch buffer where I can process it. The problem is the rate at which it runs.
The Video API of HTML5 has the 'timeupdate' event. I tried using this to have the handler process frames, once for every frame, but it runs at a slower rate than the video.
Any ideas to speed up processing frames?
You can get much more frequent redraws by using requestAnimationFrame to determine when to update your canvas, rather than relying on timeupdate, which only updates every 200-250ms. It's definitely not enough for frame-accurate animation. requestAnimationFrame will update at most every 16ms (approx 60fps), but the browser will throttle it as necessary and sync with video buffer draw calls. It's pretty much exactly what you want for this sort of thing.
Even with higher frame rates, processing video frames with a 2D canvas is going to be pretty slow. For one thing, you're processing every pixel sequentially in the CPU, running Javascript. The other problem is that you're copying around a lot of memory. There's no way to directly access pixels in a video element. Instead, you have to copy the whole frame into a canvas first. Then, you have to call getImageData, which not only copies the whole frame a second time, but it also has to allocate the whole block of memory again, since it creates a new ImageData every time. Would be nice if you could copy into an existing buffer, but you can't.
It turns out you can do extremely fast image processing with WebGL. I've written a library called Seriously.js for exactly this purpose. Check out the wiki for a FAQ and tutorial. There's a Hue/Saturation plugin you can use - just drop the saturation to -1 to get your video to grayscale.
The code will look something like this:
var composition = new Seriously();
var effect = composition.effect('hue-saturation');
var target = composition.target('#mycanvas');
effect.source = '#myvideo';
effect.saturation = -1;
target.source = effect;
The big down side of using WebGL is that not every browser or computer will support it - Internet Explorer is out, as is any machine with old or weird video drivers. Most mobile browsers don't support it. You can get good stats on it here and here. But you can get very high frame rates on pretty large videos, even with much more complex effects.
(There is also a small issue with a browser bug that, oddly enough, shows up in both Chrome and Firefox. Your canvas will often be one frame behind the video, which is only an issue if the video is paused, and is most egregious if you're skipping around. The only workaround seems to be to keep forcing updates, even if your video is paused, which is less efficient. Please feel free to vote those tickets up so they get some attention.)

Finding significant images in a set of surveillance camera images

I've had theft problems outside my house so I setup a simple webcam to capture every second with Dorgem (http://dorgem.sf.net).
Dorgem does offer a feature to use motion detection to only capture frames where something is moving on the screen. The problem is that the motion detection algorithm it uses is extremely sensitive. It goes off because of variations in color between successive shots on my cheap webcam, and it also goes off because the trees in front of the house are blowing in the wind. Additionally, the front of my house is a high traffic area so there is also a large number of legitimately captured frames.
I average capturing 2800/3600 frames every second using Dorgem's motion detection. This is too much for me to search through to find out where the interesting activity is.
I wish I could re-position the camera to a more optimal position where it would only capture the areas I'm interested in, so that motion detection would be simpler, however this is not an option for me.
I think that because my camera has a fixed position and each picture frames the same area in front of my house, then I should be able to scan the images and figure out which ones have motion in some interesting region of that image, throwing out all other frames.
For example: if there's a change in pixel 320,240 then someone has stepped in front of my house and I want to see that frame, but if there's a change in pixel 1,1 then its just the trees blowing in the wind and the frame can be discarded.
I've looked at pdiff, a tool for finding diffs in sets of pictures, but it seems to be also focused on diffing the entire picture, rather than a specific region of it:
I've also looked at phash, a tool for calculating a hash based on human perception of an image, but it seems too complex:
I suppose I could implement it in a shell script using imagemagick's mogrify -crop to cherry pick the regions of the image I'm interested in, then running pdiff to find the interesting ones, and using that to pick out the interesting frames.
Any thoughts? ideas? existing tools?
cropping and then using pdiff seems like the best choice to me.

How Can I Clone a Camera Source DirectShow Filter?

I'm doing some stereoscopic work which means I have need to work with two instances of various filters (i.e. a camera source that receives an IP stream), and this is proving not to be trivial.
I even tried copying the IPCamfilter.ax to IPCamfilter.ax and manually making new CLSID entries in the reg, and the clone shows up, but won't work. Any ideas?
Should I edit the clone filters binary to change its CLSID and then register it? Or is there a simple way to use GraphEdit to do this?
Do you work with two cameras or with one camera and you wanna have two pictures.
In the first case, there are some filters which work just with one connected device (in case for e.g. firewire, cameras have to be connected to two different controllers).
In the latter case, you can use the Infinite Pin Tee Filter to get two streams of the one device. You can test that in GraphEdit as well.
There's nothing in COM that prevents you creating two instances of the same clsid, so you're solving the wrong problem by trying to change the clsid. There must be something in the filter internals that prevents multiple use in the same process.
If you can't get access to the source to fix it, you could have two capture graphs in separate processes and then use a bridge of some sort to combine the two outputs in a third graph (or in your application).
SplitCam is a freeware virtual video clone and video driver for connecting several applications to a single video capture source. Usually, if you have a camera connected to your PC, you cannot use it in more than one application at the same time, and there is no standard Windows options that makes it possible. Split Camera allows you to easily multiply your video source in any conferencing software like ICQ, Yahoo, MSN Messenger, or whatever.
Video Processing Filter is a powerful transform filter that allows rotate the video in 90, 180, and 270 degrees ,keep aspect ratio when rotated the video in 90 and 270 degrees , flip the video, convert a RGB video stream to Grayscale and invert color. Support rotate the video in 90, 180, and 270 degrees in any Directshow base application. Support keep aspect ratio when rotated the video in 90 and 270 degrees.