New to Mac OS X, familiar with Windows. Windows has DirectShow, a good number of built-in filters, COM programming, and GraphEdit for very fast prototyping and snooping on the graphs you've constructed in code.
I'm now about to go to the Mac to work with cameras, webcams, microphones, color spaces, files, splitting, synchronization, rendering, file reading, file saving, and many of things I've come to take for granted with DirecShow when putting together applications for live performance. On the Mac side, so far I've found ... nothing! Either I don't know where to look or I'm having the toughest time tying the Mac's reputation for its ease of handling media with a coherent programmatic ability to get in there and start messin' with media manipulatin' building blocks.
I've seen some weak suggestions to use gstreamer or some library for QT but I can't bring myself to believe that this is the Apple way to go. And I've come across some QuickTime documentation but I'm not looking to do transitions, sprites, broadcasting, ...
Having a brain trained on DirectShow means I don't even know how Apple thinks about providing DirectShow-like functionality. That means I don't know the right keywords and don't even know where to look. Books? Bought a few. Now I might be able to write some code that can edit your sister's wedding video (if I can't make decent headway on this topic I may next be asking what that'd be worth to you), but for identifying what filters are available and how to string them together ... nothing. Suggestions?
Video handling is going through a huge transition on the Mac at the moment. QuickTime is very old, but also big and powerful, so it's been undergoing an incremental replacement process for the past 5 years or so.
That said, QTKit is the QuickTime subset (capture, playback, format conversion and basic video editing) which is supported going forward. The legacy QuickTime APIs are still there for the moment, and probably will remain at least until its major features are available elsewhere, but are 32-bit only. For some involved video stuff you may end up needing to use it in places.
At the moment, iOS is ahead of the Mac because it could start from scratch with AV Foundation. The future of the Mac media frameworks will probably either be AV Foundation directly (with QTKit being a lightweight shim over the top) or an extension of QTKit that looks very similar.
For audio there's Core Audio which is on Mac and iOS and isn't going away any time soon. It's quite powerful but somewhat obtuse in places. Luckily online support is very good; the mailing list is an essential resource.
For filters and frame-level processing you've got Core Video as someone else mentioned, as well as Core Image. For motion graphics there's Quartz Composer which includes a graphical editor and a plugin architecture to add your own patches. For programmatic procedural animation and easily mixing rendering models (OpenGL, Quartz, video, etc.) there's Core Animation.
In addition to all of these, of course there's no reason you can't use open source libraries where the built-in stuff doesn't do what you want.
To address your comment below:
In QuickTime (and QTKit), individual data types like audio and video are represented as tracks. It may not be immediately clear that QuickTime can open audio as well as video file formats. A common way to combine audio and video would be:
Create a QTMovie with your video file.
Create a QTMovie with your audio file.
Take the QTTrack object representing the audio and add it to the QTMovie with the video in it.
Flatten the movie, so it doesn't simply contain a reference to the other movie but actually contains the audio data.
Write the movie to disk.
Here's an example from Blender. You'll see how the A/V muxing is done in the end_qt function. There's also some use of Core Audio in there (AudioConverter*). (There's some classic QuickTime export code in quicktime_export.c but it doesn't seem to do audio.)
Related
There are so many formats that intuitively are more bound to graphics: it could be vector graphics format such as SVG, various PNG, JPEG formats etc. Why PDF of them all has a dedicated library?
Note: this question is NOT opinion-based. I'm interested in the very reason for why PDF has such a central point in xOS. Give it a day to see.
My bonafides: I licensed my PDF viewer to Apple for Rhapsody and then the PDF library I wrote in Objective-C to Apple as the basis of CoreGraphics. I’ve written code for NeXT systems since 1988, and worked as a contractor for NeXT and Apple many times.
As Rob says, NeXTstep used Display Postscript (“DPS”), under license from Adobe. When Apple switched from “Classic” Mac OS to OS X, they absolutely didn’t want to keep paying Adobe the per-seat license for DPS each Mac it sold – it had been years since they’d agreed to pay a per-seat license to anyone for the Mac.
And, at the same time, DPS didn’t reflect modern graphics trends. It was a Turing-complete language and ran in its own process, and client apps would have to send inter-process messages to it, which meant expensive context switches and no direct sharing of buffers. (Also it broke down isolation between client processes.)
So the graphics team (led then by Peter Graffagnino) needed to replace DPS, but also not break all the existing NeXTstep apps. As it happened an engineer at Adobe (I believe it was originally Sam Streeper) had written a graphics language that was MUCH simpler than PostScript and wasn’t Turing complete, but still had most of the drawing conventions of PostScript. (Display PostScript was superset of PostScript, so any PostScript program runs in DPS.)
PDF was originally designed for PostScript-driven printers, because lots of PostScript programs (aka “print jobs”) ended up slurping down memory and processor time and often were just too big to work on any individual printer (RAM was expensive back then). Converting PostScript files to PDF made them process far more easily, and since PDF doesn’t have control flow you couldn’t, say, send the printer into an infinite loop. (Adobe processed PDF with a PostScript program they’d send to the printer as well.)
PDF was also an open standard, so with Apple writing their own implementation they didn’t have to pay Adobe anything to use it. PDF was fast and simple and free and had the exact same model because it was designed to run on PostScript printers.
So: Apple had to switch to a new graphics language for Mac OS X (née NeXTstep), and wanted one with the same semantics and Display PostScript so all their existing apps largely still worked.
So first they wrote the functional calls to CoreGraphics, implementing all the conventions of PostScript / PDF (eg: how clip paths work, how paths are filled if they self-overlap, resolution-independence, the state stack, etc). They rewrote all the existing high-level graphics functions (eg, NSRectFill()) so they were on top of CoreGraphics, so 99% of existing NeXTstep programs could switch from DPS to CG without modification. (By the late 1990s most NeXTstep programmers had realized dipping into DPS and trying to create effects by writing PostScript was pretty much always a bad idea, so we used the high-level calls.)
But the final missing piece was giving programs a format they could read and write to disk. In NeXTstep days we used both PostScript and Encapsulated PostScript, but the former was Right Out now. PDF was a natural fit (for the reasons outlined above) so they added reading and writing of PDF files.
Postscript (pun intended): The PDF code in CoreGraphics today is not what I licensed Apple — they used my code as more of a “working reference” and came up with their own APIs, since CoreGraphics couldn’t use Objective-C and my API was never really intended to be used by general programs.
The drawing model utilized by Quartz 2D is based on PDF specification.
It is widely stated that Quartz "uses PDF internally" (notably by Apple in their 2000 Macworld presentation and Quartz's early developer documentation), ... Quartz's internal imaging model correlates well with the PDF object graph, making it easy to output PDF to multiple devices.
https://en.wikipedia.org/wiki/Quartz_(graphics_layer)#Use_of_PDF
https://developer.apple.com/library/content/documentation/GraphicsImaging/Conceptual/drawingwithquartz2d/dq_overview/dq_overview.html
Continuing Mahal Tertin's points, focusing on the drawing side, which isn't really PDF-centric but the history is interesting anyway (I'll get to the file format side below).
Quartz was developed before 1999, and was a fairly natural continuation of NeXT's use of Display Postscript and QuickDraw, which conceptually goes back to the Lisa and was used extensively on pre-OS X Macs.
SVG might have been a reasonable base, but work on it didn't even start until 1999. It was never in a position to the basis for OS X drawing. Even if it had come out a couple of years earlier, it would still have been an untested experimental format vs a established format that Apple already had extensive code and experience for.
When iOS came along, it inherited Core Graphics, which made development dramatically easier for OS X developers. Converting everything to an SVG DOM or the like would have made iOS much harder to migrate to. (I had complex custom views on OS X that I ported verbatim to iOS with the addition of one flip translation and a hacky #define to replace NSColor with UIColor.) It also made developing iOS much easier in the first place since Apple could reuse substantial existing code. Why replace the entire drawing system with another format? To what purpose?
Quartz/PDF/Postscript all have the concept of independent objects/layers that are composited together, and has lots of support for text and lines. This makes a lot of sense when you're building a windowing system and want to easily drag a window along the screen, overlapping other windows, and minimizing computation. PNG and JPEG are built to compress single, static images. They don't make any sense for this use. JPEG in particular would be a horrible way to try to manage a windowing system full of crisp, straight lines and text, exactly the things that JPEG hates.
So that's the drawing side (which isn't really PDF-centric anyway, but it answers why SVG DOM isn't the heart of Core Graphics), but why does PDF get its own framework and not PNG?
As a graphics format, PNG does get a lot of support in iOS (as opposed to OS X). iOS is highly optimized to display PNG and converting images to and from PNG is very easy in iOS. There really isn't a need for a PNGKit. What would it do beyond what's built into UIKit, where PNG and JPEG are given their own special handling right in UIImage (handling PDF doesn't get)?
PDFKit exists because PDF is complicated, and it addresses printing problems that the other formats don't (things like handling multiple pages; when was the last time someone mailed you an SVG Print document?) While it may seem that it's getting a lot of special support by getting its own framework, PDFKit isn't really very good and doesn't support a lot of more complicated PDF features. So while there is a long history of PDF inside of Core Graphics, PDF as a file format is actually pretty poorly supported by Apple IMO. (Compare PSPDFKit, which much more powerful, and a necessary addition for almost any app that uses PDF in a non-trivial way.)
I recently moved from a PC to a MacBook Pro. I'm starting to go through tutorials on Objective-C and developing in Cocoa. I do a lot of image processing algorithm development work (pixel by pixel manipulation) in my day job so I'd like to get create a test image processing app or two for OS X. I'm struggling to figure out where to start - let's say I want to create a simple application (that I could reuse) like the following:
load an image from an open file option within a file menu
display this within the GUI.
Click a button to apply pixel by pixel processing
Update the displayed image
Save the processed image from the save option within the file menu
Any pointers or links would be most appreciated.
Thanks
Other info:
I'm pretty familiar with OpenCV within Linux - haven't looked at using it within Objective-C/Cocoa/Xcode environment yet though - not even sure if this would be a good idea?
I guess it would be nice to use GPU acceleration as well, but I'm not familiar with OpenGL/OpenCL - so I might have to put that one on the long finger for the moment.
As you are looking at the Apple platform, you should look into the CoreImage framework - it will provide you most of pre-baked cookies ready to be consumed in your application.
For more advanced purposes, you can start off with openCV.
Best of luck!!
As samfisher suggests, OpenCV is not that hard to get working on the Mac, and Core Image is a great Cocoa framework for doing GPU-accelerated image processing. I'm working on porting my GPUImage framework from iOS to the Mac, and it's entirely geared around making accelerated image processing easy to work with, but unfortunately that isn't working right now.
If you're just getting started on the Mac, one tool that I can point out which you might overlook is Quartz Composer. You have to download the separate Graphics Tools package from Apple's developer site to install Quartz Composer, because it's no longer shipped with Xcode.
Quartz Composer is a graphical development tool that lets you drag and drop modules, connect inputs and outputs, and do rapid development of some fairly interesting things. One task it's great for is doing rapid prototyping of image processing, either using Core Image or OpenGL shaders. I've even heard of people using OpenCV with this using custom patches. You can easily connect an image or camera source into a filter chain, then edit the filters and see live updates as you work on them, without requiring a compile-run cycle.
If you want some sample QC projects to play with, I have a couple of them linked from this article I wrote a couple of years ago. They both do the same color-based object tracking, with one using Core Image and the other OpenGL shaders. You can dig into that and play around to see how that works, without having to get too far into writing any code.
I am writing a simple application for streaming video over the network, using a slightly different from the ordinary "H.264 over RTP" approach (i am using my own codecs).
To achieve this, i need raw frames and raw audio samples that QTMovie, when playing back a movie, implicitly sends to QTMovieView.
The most common way to retrieve raw video frames is to use VisualContext - and then, using a display link callback, i "generate" a CVPixelBufferRef, using this VisualContext. So i am getting frames with some frequency that is synchronized with my current refresh rate (not that i need this synchronization - i only need to have a "stream" of frames that i can transmit over the network - but CoreVideo Programming Guide and most Apple samples related to video promote this approach).
The first problem i have faced with - is when i attach a VisualContext to a QTMovie, the picture can't be rendered onto the QTMovieView anymore. I don't know why does this happen (i guess it's related to the idea of GWorld and the rendering being "detached" from it when i attach VisualContext). Ok, at least i have frames, which i could render onto a simple NSView (though this sounds wrong, and performance-unfriendly. Am i doing it right?)
What about the sound, i have no idea what to do. I need to get raw samples of sound as the movie being played (ideally - something similar to what QTCaptureDecompressedAudioOutput returns in its callback).
I have prepared myself to delving into deprecated Carbon QuickTime APIs, if there is no other way. But I don't know even where to start. Should i use the same CoreVideo Display link and periodically retrieve sound somehow? Should i get QTDataReference and locate the sound frames manually?
I am actually a beginner with programming video and audio services. If you could share some experience i would REALLY appreciate any idea you could share with me :)
Thank you,
James
Is there any code out there, that I can use in Cocoa, to recognize text from photos? Let's say I snap a photo with my iPhone of a page of a book. I'd like to capture the text in it.
There is the Tesseract OCR toolkit that is an open source OCR engine, currently maintained by Google. "Olipion" created a cross compilation tutorial to get in on the iPhone. I would say that this is a good place to start.
However, there are reasons why you might not want to to OCR on the Phone even if you could. Some of these include:
Even the new iPhone 4's processor is not that fast and since you app can't really run in the background doing the processing, the user experience might not be optimal.
Running OCR on a mobile device would probably be a killer for battery life.
Every time you would want to update the OCR engine everybody who installed your app would have to upgrade.
For an always connected mobile device running the OCR on a server somewhere would be probably better. You could upgrade your OCR software easily, you could run much more powerful algorithms then a mobile device could handle and so on.
I am not so sure that you would be able to get good results from photos taken using a mobile camera -- accuracy of OCR systems goes way down with the kind of poorly lit, noisy, distorted images likely to be captured using a phone camera.
As far as commercial products out there, there is Evernote that gives you a OCR capability if you buy their premium service.
As an alternative to machine OCR, there is always Mechanical Turk, where you could pay people small amount to do the OCR for you. Would probably do better at transcription given the image source.
I'm interested in writing some homebrew code for the Microsoft Kinect console. I have a few applications which I think would translate well to the platform. I've been toying with the idea of giving it a shot using the OpenKinect drivers and libraries. Obviously this would be a lot of work, but I am wondering just how much. Does anyone have experience with OpenKinect? Do you get only the raw video/audio data from the device, or has anyone written higher level abstractions to make common tasks easier?
The OpenKinect library is basically a driver — at least for now — so don't expect much high functions from it. You will more or less get the raw data from both the depth and the video cameras.
This is basically an array received in a callback function each time a frame arrives.
You can give it a try by following the instructions provided on the OpenKinect website, it's really quick to install and try it, and you can play a bit with the glview application provided to get a feeling of what's possible.
I've set up a few demos using opencv, and got pretty cool results even though I didn't have much background in computer vision so I can only encourage you to try it yourself!
Alternately, if you're looking for more advanced functions, the OpenNI framework was just released this week and provides some impressive high level algorithms such as skeleton tracking and some gesture recognition. Part of the framework is proprietary algorithms from PrimeSense (like the powerful skeleton tracking module...). I haven't tried it yet and don't know how well it integrates with the kinect and the different OS, but since a bunch of guys from different groups (OpenKinect, Willow Garage...) are working hard on it that shouldn't be an issue within a week.
Elaborating further on what Jules Olleon wrote, i've worked with OpenNI (http://www.openni.org) and the algorithms above it (NITE), and I highly recommend using these frameworks. Both frameworks are well-documented, and come with numerous samples from which you can start out.
Basically, OpenNI abstracts the lower-level details of working with the sensor and its driver for you, and gives you a convenient way to get what you want from a "generator" (e.g. xn::DepthGenerator for getting the raw depth data). OpenNI is open-source and free to use in any application. OpenNI also handles the platform-abstraction for you. As of today, OpenNI is supported and works fine for Windows 32/64 and linux, and is in the process of being ported to OSX. Bindings are available for use in multiple programming languages (C, C++, .NET, Python, and a few others I believe).
NITE has additional interfaces built above OpenNI, which give you higher-level results (e.g. track a hand-point, skeletons, scene analysis etc). You'll want to check the subtleties of NITE's license regarding when/where you can use it, but it's still probably the easiest and fastest way to get analysis (e.g. skeleton) for now. NITE is closed-source, so PrimeSense need to supply a binary version for you to use. Currently windows and linux versions are available.
I haven't worked with with OpenKinect but I've been working with OpenNI and SensorKinect for a few months now for my research. If you are planning to work with raw data from Kinect, they work great in giving you depth and video (they don't support motor control). I've used it with C++ and OpenGL in both Windows 64bit and Ubuntu 32bit with almost no modifications to the code. It's very easy to learn if you know basic c++. Installing it might be a little headache.
For more advanced features such as skeleton detection, gesture recognition, etc., I highly recommend using the middlewares such as NITE with OpenNI or the ones provided in here: Middlewares developed around OpenNI rather than re-inventing the wheel. Nite is also very easy to use once you have OpenNI working; e.g. joint recognition is something around 10-20 extra lines of code.
Something that I would recommend to my younger self would be to learn and work with a basic game engine (e.g. Unity) rather than directly with OpenGL. It would give you a lot better and more enjoyable graphics, less hassle and would also enable you to easily integrate your program with other tools such as PhysX. I haven't tried any, but I know there are some plugins for using Kinect drivers in Unity.