OCR (reading text from photos) in Cocoa? - objective-c

Is there any code out there, that I can use in Cocoa, to recognize text from photos? Let's say I snap a photo with my iPhone of a page of a book. I'd like to capture the text in it.

There is the Tesseract OCR toolkit that is an open source OCR engine, currently maintained by Google. "Olipion" created a cross compilation tutorial to get in on the iPhone. I would say that this is a good place to start.
However, there are reasons why you might not want to to OCR on the Phone even if you could. Some of these include:
Even the new iPhone 4's processor is not that fast and since you app can't really run in the background doing the processing, the user experience might not be optimal.
Running OCR on a mobile device would probably be a killer for battery life.
Every time you would want to update the OCR engine everybody who installed your app would have to upgrade.
For an always connected mobile device running the OCR on a server somewhere would be probably better. You could upgrade your OCR software easily, you could run much more powerful algorithms then a mobile device could handle and so on.
I am not so sure that you would be able to get good results from photos taken using a mobile camera -- accuracy of OCR systems goes way down with the kind of poorly lit, noisy, distorted images likely to be captured using a phone camera.
As far as commercial products out there, there is Evernote that gives you a OCR capability if you buy their premium service.
As an alternative to machine OCR, there is always Mechanical Turk, where you could pay people small amount to do the OCR for you. Would probably do better at transcription given the image source.

Related

LabVIEW 2018 USB Webcam Image Grab

I'm looking to capture an image from my usb webcam in LabVIEW 2018. I've looked at older posts (the one from Lava working with the 'free' portions of V&M Toolkit, another webcam test that hangs my computer when trying to run and a few others). What is the best way to do this in the newer LabVIEWs? All the examples I've seen (none of which run correctly or well) are all from 2011-ish timeframe.
It depends on the task (like, for what you are going to use camera), but you could use NI Vision Acquisition Software - which provides set of functions to access the camera, acquire images and videos and process them (basically, IMAQ drivers is what you need). Or, if you are going to use your camera for some kind of test application (vision inspection) - then you'd better check Vision Builder for Automated Inspection.
Those are the easiest (but not the cheapest) ways to acquire images from the various cameras using LabVIEW.
UPDATE:
License scheme for the software could be found here - Licensing National Instruments Vision Software. Description of each software component is also here - Does My Camera Use NI-IMAQ, NI-IMAQdx or NI-IMAQ I/O?. So in order to use 3rd party USB camera, one need to have NI-IMAQdx, which requires license.

Streaming IP Camera solutions that do not require a computer?

I want to embed a video stream into my web page, which is part of our own cloud based software. The video should be low-latency (like video conferencing), and it would be preferable, but not required, for it to include audio. I am comfortable serving streaming binary data from the server-side, and embedding it into the page using HTML5 video.
What I am not comfortable with is the ability to capture the video data to begin with. The client does not already have a solution in place, and is looking to us for assistance. The video would be routed through our server equipment, and not be an embedded peice that connects directly to the video source.
It is a known quantity for us to use a USB or built-in camera from the computer. What I would like more information is about stand-alone cameras.
Some models of cameras have their own API documentation (example). It would seem from what I am reading that a manufacturer would typically have their own API which they repeat on many or all of their models, and that each manufacturer would be different in their API. However, I have only done surface reading and hope to gain more knowledge from someone who has already researched this, or perhaps even had first hand experience.
Do stand-alone cameras generally include an API? (Wouldn't this is a common requirement, so that security software can use multiple lines of cameras?) Or if not an API, how is the data retrieved from the on-board webserver? Is it usually flash based? Perhaps there is a re-useable video stream I could capture from there? Or is the stream formatting usually diverse?
What would I run into when trying to get the server-side to capture that data?
How does latency on a stand-alone device compare with a USB camera solution?
Do you have tips on picking out a stand-alone camera that would be a good fit for streaming through a server?
I am experienced at using JavaScript (both HTML5 and Node.JS), Perl and Java.
Each camera manufacturer has their own take on this from the point of access points; generally you should be able to ask for a snapshot or a MJPEG stream, but it can vary. Take a look at this entry on CodeProject; it tackles two common methodologies. Here's another one targeted at Foscam specifically.
Get a good NAS, I suggest Synology, check out their long list of supported IP Web Cams. You can connect them with a hub or with a router or whatever you wish. It's not a "computer" as-in "tower", but it does many computer jobs, and it can stay on while your computer is off or away, and do thing like like video feeds, torrents, backups, etc.
I'm not an expert on all the features, so I don't know how to get it to broadcast without recording, but even if it does then at least it's separate. Synology is a popular brand and there are lot of authorized and un-authorized plugins for it. Check them out and see if one suits you.

Good speech recognition engine for Mac, not iOS?

Sorry if this is a repeat question, but I didn't see it anywhere.
I'm working on a Mac program that will take voice commands, and NSSpeechRecognizer isn't quite doing it for me.
I want something a little more dynamic so I can set alarms, make dates, give more natural commands, etc.
Every open source speech engine I've found is tailored toward iOS. Do openears/vocalkit etc. still work just as fine for Mac programs?
Speech recognition is exceptionally non-trivial. The engines that are free are free for a reason. If you expect dictation in any amount (like an alarm label), you're out of luck. There are reasons Siri requires an entire data center. The open source packages available won't get you much further than simple telephone auto-attendants.
Unless you have an extensive statistics background and free time, I'd recommend that you pursue licensing a commercial library or server implementation.
pocket sphinx from Carnegie Melon is about the only option
http://cmusphinx.sourceforge.net/

Mac OS X equivalent for DirectShow, GraphEdit

New to Mac OS X, familiar with Windows. Windows has DirectShow, a good number of built-in filters, COM programming, and GraphEdit for very fast prototyping and snooping on the graphs you've constructed in code.
I'm now about to go to the Mac to work with cameras, webcams, microphones, color spaces, files, splitting, synchronization, rendering, file reading, file saving, and many of things I've come to take for granted with DirecShow when putting together applications for live performance. On the Mac side, so far I've found ... nothing! Either I don't know where to look or I'm having the toughest time tying the Mac's reputation for its ease of handling media with a coherent programmatic ability to get in there and start messin' with media manipulatin' building blocks.
I've seen some weak suggestions to use gstreamer or some library for QT but I can't bring myself to believe that this is the Apple way to go. And I've come across some QuickTime documentation but I'm not looking to do transitions, sprites, broadcasting, ...
Having a brain trained on DirectShow means I don't even know how Apple thinks about providing DirectShow-like functionality. That means I don't know the right keywords and don't even know where to look. Books? Bought a few. Now I might be able to write some code that can edit your sister's wedding video (if I can't make decent headway on this topic I may next be asking what that'd be worth to you), but for identifying what filters are available and how to string them together ... nothing. Suggestions?
Video handling is going through a huge transition on the Mac at the moment. QuickTime is very old, but also big and powerful, so it's been undergoing an incremental replacement process for the past 5 years or so.
That said, QTKit is the QuickTime subset (capture, playback, format conversion and basic video editing) which is supported going forward. The legacy QuickTime APIs are still there for the moment, and probably will remain at least until its major features are available elsewhere, but are 32-bit only. For some involved video stuff you may end up needing to use it in places.
At the moment, iOS is ahead of the Mac because it could start from scratch with AV Foundation. The future of the Mac media frameworks will probably either be AV Foundation directly (with QTKit being a lightweight shim over the top) or an extension of QTKit that looks very similar.
For audio there's Core Audio which is on Mac and iOS and isn't going away any time soon. It's quite powerful but somewhat obtuse in places. Luckily online support is very good; the mailing list is an essential resource.
For filters and frame-level processing you've got Core Video as someone else mentioned, as well as Core Image. For motion graphics there's Quartz Composer which includes a graphical editor and a plugin architecture to add your own patches. For programmatic procedural animation and easily mixing rendering modelsĀ (OpenGL, Quartz, video, etc.) there's Core Animation.
In addition to all of these, of course there's no reason you can't use open source libraries where the built-in stuff doesn't do what you want.
To address your comment below:
In QuickTime (and QTKit), individual data types like audio and video are represented as tracks. It may not be immediately clear that QuickTime can open audio as well as video file formats. A common way to combine audio and video would be:
Create a QTMovie with your video file.
Create a QTMovie with your audio file.
Take the QTTrack object representing the audio and add it to the QTMovie with the video in it.
Flatten the movie, so it doesn't simply contain a reference to the other movie but actually contains the audio data.
Write the movie to disk.
Here's an example from Blender. You'll see how the A/V muxing is done in the end_qt function. There's also some use of Core Audio in there (AudioConverter*). (There's some classic QuickTime export code in quicktime_export.c but it doesn't seem to do audio.)

A PDF reader - please guide - a step by step guidance - reference to guidance-

I have to make a hardware project using a microcontroller, memory, screens, etc.
Is it possible to make an independent PDF / documents reader, which is capable of running on battery power?
Please note I don't want to use any technology which needs licensing. It must be all freeware readers, etc., and programing language can be assembly, C, Flash or any.
I have submitted proposal of PDF reader project (independent hardware). Many say it's impossible. What should I do?
Reading and displaying a PDF document is quite a "high level operation".
You should start with a microcontroller starter kit, with an ARM9 processor or something similar. Then install a Linux operating system on it, include a standard display driver and run an X server. Then you should be able to find a Linux based PDF reader with X drivers.
To 2nd another comment here, I would say that you're not going to to do this with a microcontroller, you're going to need to get some more powerful ARM CPU like an ARM9, Cortex-A8 or similar with a decent amount of RAM.
You'll probably need something that's capable of running Linux if you want to start with pieces of software that won't require writing quite a large volume of software from scratch.
Note that for commercial devices that are out there, including the Kindle, run Linux, and aren't based on a micrcontroller.
You might be best off getting something like a BeagleBoard, attach a display to that, and start from there with an X-based PDF viewer.