I need to get access to the actual camera on the iPhone and take a 'snapshot' of it every .X seconds.
On each update, I'll run image processing on it (a modified version of OpenCVCircles) and if conditions are met, exit out of the camera while taking certain actions. If conditions are not met, then I'll simply stay in the camera. The conditions will be a custom configuration of a series of circles that the user has to look at through the camera.
I know that this could be easy by forcing the user to take a picture, and I grab that from the UIImagePicker. However, I think it would be much better to do it automatically for the user, if they are in view of the image.
Is it possible to do this without completely writing my own Camera by AVCapture classes?
Look into AVCaptureSession, AVCaptureDevice and the like classes. They should provide what you need. We are using it here to provide a live video feed through the network.
Edit
From your question edit, I see this answer does not apply directly to your use case. Yet it is the only mean to my knowledge that will allow you to accomplish what you seek.
Related
I am interested in getting into user interaction/shape detection with a simple usb webcam. I can use multiple webcams, but don't want to be restricted to using something like the kinect sensor. My detection cameras need to be set up on either side of a helmet (or if an individual one, on top). I have found some, but they don't really have the functionality I need and most are angled towards facial recognition. I need to be able to detect a basic human skeletal structure and determine if something is obstructing it. I would really rather be able to do it without using any sort of marker system on the target person. I would like for it to be able to target multiple structures. Obviously I am willing to do tweaking if necessary, but want to see how close I can get to what I need before I rebuild the wheel. I am trying to design an ai system that can determine how many people are in an area and where they are.
Doubt there will be anything like this since Microsoft spent a ton of money on the R&D for Kinect and it's probably all locked behind an NDA. I'm also guessing there's a lot of hardware within the Kinect that is not available in a standard webcam.
The closest thing that I could find to what you're looking for is the OpenKinect project, might be a good place to start your research.
This is by far not a showstopper problem just something I've been curious about for some time.
There is this well-known -[UIImage resizableImageWithCapInsets:] API for creating resizable images, which comes really handy when texturing variable size buttons and frames, especially on the retina iPad and especially if you have lots of those and you want to avoid bloating the app bundle with image resources.
The cap insets are typically constant for a given image, no matter what size we want to stretch it to. We can also put that this way: the cap insets are characteristic for a given image. So here is the thing: if they logically belong to the image, why don't we store them together with the image (as some kind of metadata), instead of having to specify them everywhere where we got to create a new instance?
In the daily practice, this could have serious benefits, mainly by means of eliminating the possibility of human error in the process. If the designer who creates the images could embed the appropriate cap values upon exporting in the image file itself then the developers would no longer have to write magic numbers in the code and maintain them updated each time the image changes. The resizableImage API could read and apply the caps automatically. Heck, even a category on UIImage would make do.
Thus my question is: is there any reliable way of embedding metadata in images?
I'd like to emphasize these two words:
reliable: I have already seen some entries on the optional PNG chunks but I'm afraid those are wiped out of existence once the iOS PNG optimizer kicks in. Or is there a way to prevent that? (along with letting the optimizer do its job)
embedding: I have thought of including the metadata in the filename similarly to what Apple does, i.e. "#2x", "~ipad" etc. but having kilometer-long names like "image-20.0-20.0-40.0-20.0#2x.png" just doesn't seem to be the right way.
Can anyone come up with smart solution to this?
Android has a filetype called nine-patch that is basically the pieces of the image and metadata to construct it. Perhaps a class could be made to replicate it. http://developer.android.com/reference/android/graphics/NinePatch.html
Kinect sensor raises many events per second and if you are not very fast to elaborate them (for example trying to animate a true 3D character) in a few frames you get stuck.
What is the best approach to handle only the reasonable number of events, without blocking the User Interface?
Thanks.
I would suggest requesting the frame in a loop instead of using the event method.
To do this in your animation loop just call:
sensor.DepthStream.OpenNextFrame(millisecondsWait);
Or:
sensor.SkeletonStream.OpenNextFrame(millisecondsWait);
Or:
sensor.ColorStream.OpenNextFrame(millisecondsWait);
Event driven programming is great but when you run into problems like you mention it is better to just call the functions when you need it.
I'd say that if you're animating something really quick and elaborate (e.g. complex 60fps 3D image), the time you'll take to get the image from the camera synchronously might create bumps in your rendering.
I'd try splitting the rendering and the Kinect processing/polling in separate threads; with that approach you could even keep using the 30fps event-driven model.
I've got a very non-standard AVFoundation question and as a relative newbie to the iOS world I could really use some guidance from the experts out there -
I'm working on an app that lets the user record bits of audio which I need to programmatically arrange using AVMutableComposition. Here's the thing, in addition to the audio track I want to capture and save accelerometer data and have it synced with the sound. Typically AVFoundation is used for known media types like still photos, audio, and video but I was wondering whether it's feasible to capture something like accelerometer data using this framework. This would make it much easier for me to sync the sensor data with the captured audio, especially when putting the parts together with AVMutableComposition.
Here is what I need to accomplish:
Record accelerometer data as an AVAsset/AVAssetTrack so I can insert it into an AVMutableComposition
Allow for playback of the accelerometer data in a custom view alongside the audio it was recorded with
Save the AVMutableComposition to disk, including both audio and accelerometer tracks. It would be nice to use a standard container like Quicktime
For part 1 & 3 I'm looking at using AVAssetReader, AVAssetReaderOutput, AVAssetWriter, AVAssetWriterInput classes to capture from the accelerometer but without much experience with Cocoa I'm trying to figure out exactly what I need to extend. At this point I'm thinking I need to subclass AVAssetReaderOutput and AVAssetWriterInput and work with CMSampleBuffers to allow the conversion between the raw accelerometer data and an AVAsset. I've observed that most of these classes only have a single private member referencing a concrete implementation (i.e. AVAssetReaderInternal or AVAssetWriterInputInternal). Does anyone know whether this a common pattern or what it means for writing a custom implementation?
I haven't yet given part 2 much thought. I'm currently using an AVPlayer to play the audio but I'm not sure how to have it dispense sample data from the asset to my custom accelerometer view.
Apologies for such an open ended question - I suppose I'm looking more for guidance than a specific solution. Any gut feelings as to whether this is at all possible with AVFoundations architecture?
I would use a NSmutableArray and store the accelerator data plus the time code there. So when you play back you can get the time from the video player and use that look up the accelerator data in the array. Since the data is store as a time line you don't have to search the whole array. It is enough to step forward in the array and check when the time data coincide.
In a game, as far as camera controlling scripts, would you have one "camera control script" that handles all the cameras, or a script for each individual camera (player camera, cutscene cameras etc).
I'm just trying to figure out the proper way to handle scripting in unity that will keep things simple in the long run as well as be memory efficient.
Camera actions would be following the player for the player camera, or panning around a cutscene or staying still for a different cutscene
It really depends on the scenario/work flow/production pipe, it really doesn't matter which way you organise it but if you are less comfortable with scripting I would say putting it all in one file is probably the best way to deal with it at the moment.
I would normally say that if you plan to reuse segments of code its generally a good thing to keep things modular i.e. one class per script.
EDIT: To your specific problem, I would have a system in which you switch between one (or more) camera/s that translates between scripted points i.e. stored vector3 variables (for static cameras and cinematic) and the main camera which has the movement script associated with it. This way you minimise the amount of cameras you are using in your scene and you can reuse the static cameras as many times as you need to.
A tidy way to deal with the data storing might be to use an array of structs that can store all the data you need, i.e the cameras position, rotation, FOV etc