Possible to get _all_ rendered text on OS X? - objective-c

I'm looking to log all the text that gets displayed on my OS X 10.6 machine. e.g. all webpage text (no matter the browser), PDF text (not necessarily the entire PDF, but at very least all the text that was actually viewed), anything I type into emacs, any email I write.
I've looked at the Accessibility API, but it seems to be more about describing function than content - and in any case relies on application developers to have implemented accessibility objects. Is there something lower-level? perhaps I can watch everything that goes through the OS font renderer?
After searching for a while my impression is that Apple doesn't explicitly make this possible, I'm open to any hackish suggestions you might have.

You'd have to get deep inside the Window Server to have any hope of getting all the text that was written to the screen. I suppose you could patch it yourself, but it's hard to see how without source. What you want has obvious nefarious uses so there's hardly going to be a public API for it.
Just a shot in the dark, but what about turning on Screen Sharing on the 'target' Mac and pointing a modified VNC client at it? I don't know whether text is sent as text over VNC or not, but if it was that might be one place to start. It's effectively giving you a Window Server equivalent that you control.

Related

Download pdf - accessibility for screen readers

I'm curious how to make an accessible button for screen readers which downloads PDF.
I know that there is an option using href and pass there an URL to the pdf file, and even a download attribute inside an anchor to open a download window.
But it's not a good way for a screen reader. The screen reader reads it as a link but actually, this is not a link because it triggers downloading a pdf file rather than redirect to another page. So this can be confusing for people with vision disorders who rely on their screen readers.
Is it a good accessibility way to create such a button? Or relying on <a href='path-to-pdf'>...</a> is completely enough and not confusing for people with disabilities ?
General answer and basics of file download
Both a link and a button are perfectly fine, it doesn't make much difference.
IN any case, it's very important to explicitly indicate that the link or button is going to download a file rather than open a page, to avoid surprise.
The simplest and most reliable is just to write it textually, i.e. "View the report (PDF)".
You may also put a PDF icon next to the link to indicate it, but make sure to use a real image, i.e. <img alt="PDF" /> and not CSS stuff, since the later may not be rendered to screen readers and/or don't give you the opportunity to set alt text (which is very important).
A good practice is also to indicate the file size if its size is big (more than a few megabytes), so that users having a slow or limited connection won't get stuck or burn their mobile data subscription needlessly.
It's also good to indicate the number of pages if it's more than just a few, so that people can have an idea on how big it is, and if they really can take the required time to read it.
Example: "View the report (PDF, 44 pages, 17 MB)"
Note that similarly, that's a good practice to indicate the duration of a podcast or video beforehand.
Additional considerations with PDF
First of all, you should make sure that your PDF is really accessible. Most aren't by default, sadly.
You should easily find resources on how to proceed to make a PDF accessible if you don't know.
Secondly, for an accessible PDF files to be effectively read accessibly, it has to be opened inside a real PDF reading program which supports tagged PDFs, like Adobe Reader.
The problem is that nowadays, most browsers have an integrated PDF viewer. These viewers usually don't support tagged PDFs, and so, even if you make an accessible PDF, it won't be accessible to the user if it is open inside that integrated browser viewer.
So you must make sure that your link or button triggers an effective download or opening in a true PDF reading program, rather than opening in an integrated viewer of the browser.
Several possibilities that may or may not work depending on OS/browser to bypass the integrated viewer. They have to be tested to make sure they work:
Send a header Content-Disposition: attachment; filename="something.pdf"
Send a Content-Type different than "application/pdf" or "text/pdf", e.g. "application/octet-stream" to fake out basic type detection
Make the link don't ends with .pdf
Use the download attribute of <a>
The most reliable are response headers. Most browsers don't rely only on file extension alone to decide what to do.
Either a link or a button is fine. The most important thing is that the user is informed about what the element does - i.e. it downloads/opens a PDF file. So, this should be reflected in the element's label, whether that is a visible text label or an icon that uses alt text or aria-label to explicitly describe the element's purpose.
I agree with Quentinc's suggestion to also inform the user upfront about the number of pages and size of the document - that's a nice touch that I don't see very often!
PDF accessibility is a whole other topic, but again as QuentinC points out, there's not much good in allowing a user to download or view a PDF that isn't accessible, so it's a good idea to ensure the PDF has been tested against JAWS/NVDA/VoiceOver/TalkBack to ensure it is readable.

Edge Animate Automation

Most Adobe products have the ability to be automated using AppleScript or ExtendScript/JavaScript but I don't seem to see the same capabilities in Edge Animate. Maybe I'm just missing something. I'm looking to be able to do things like open the document, add images, save the document, etc. Has anyone been able to find anything like this? I've done a number of different searches to no avail.
I'm not exactly sure what you're asking, but I think you're talking about adding your own javascript, which can be done by clicking the curly brackets located next to any of the elements in the animation, or hittin ctrl+E to see the full code.
Second, in terms of opening the document, you should be able to just double click the an file that it creates, and saving the document is just like in any other program, file>save(as).
Adding images is as simple as file>Import (hotkey = ctrl + I).
not exactly what I was asking for, but I did end up finding a solution. I was looking for a mechanism to be able to control Edge Animate similar to how you can control InDesign, Photoshop, and Flash via VBScript, and JavaScript respectively. This allows you to do things like import images into your document from an external script, save the document, export contents, etc. In the end, I wrote some code that sends key-strokes to the application and that resolved the problem although not ideally, IMHO. Thanks for your responses, though.

How to show document preview in iCloud conflicts sheet in Mac App using NSDocument

I am creating a Mac App, using NSDocument, that stores a custom class of documents to iCloud.
I was able to get the program to store documents to iCloud quite easily by just Code Signing it, Sandboxing it, and adding iCloud entitlements; however, I'm still encountering a problem where when I trigger an iCloud conflict and the program drops down the sheet allowing the user to resolve the conflict the rows in the sheet do not show the small image of the document (like Preview and TextEdit do).
Additionally, when I click on the area where the image should be (it's blank) it opens up a Quick Look window that just displays an image of the Document Icon together with some other information as opposed to a snap shot of the actual file like Preview and TextEdit do.
I have not found any information in Apple's documentation that explains what I need to do to implement the same behaviour as Preview and TextEdit.
So far I've been surprised by how easily I've been able to get all of the functionability of not only the Auto Saves and the Versions browser, but also saving to the Cloud. NSDocument seems to do all of this for the developer (resolving iCloud Conflicts, etc.), as Apple's documents says it does, but again I'm not getting this other behaviour and I don't want to reinvent the wheel by writing code that is not needed.
I'm thinking that the answer might lie somewhere with implementing a Quick Look thumbnail (for the small image in the table in the sheet) and a Quick Look preview for the larger preview of the document when that in the sheet is clicked on, but this seems like a lot of work and I'm afraid of losing some of the other build-in functions of NSDocument if I start "trapping" NSDocument routines up the food chain so to speak.
Has anyone else encountered this problem and found the easiest solution?
Update: Dec. 25/12
I've finally figured out that the problem is I need a QuickLook generator to display both a QL Thumbnail (which shows up in the table in the conflicts sheet) and a QL Preview (which is displayed when a user clicks on the Thumbnail)
I ended up creating the QL generator project, and afterwards creating a workspace which I added my main project and the QL generator project to. After that I added a Copy Files Build Phase to the main project to copy the QL generator into the main Application bundle.

Finding a word's frame (position and size) on the screen using Cocoa or Carbon

Here's a tough one:
I need to be able to find a word's position and size (its frame) on the screen (its first occurence is enough, from there I should be able to get the next ones).
For example, I would like to be able to detect word positions in (but not limited to) Word, Excel and PowerPoint for Mac, as well as Safari and others.
The solution should be as fast as possible; I should be able to find at least 5-6 words per second and use as little CPU time as possible.
Here's what I thought of so far:
OCR in a window's screenshot / graphics context (any good Open Source framework that works on Mac OS X 10.4 and that can be used in a commercial product?). Evernote is very good at spotting words in images. I don't know if it uses a custom in-house engine or an Open Source / commercial one but that would be the kind of engine I would like to use if this is a "valid" solution. Ideally I would detect the word's frame in the active application's window (how to get the frame of another application?).
Getting some kind of "hook" on Quartz drawing of text and intercepting the location of the word when it's drawn (does not seem very feasible at first glance!).
AppleScript, but it depends a lot on what API the application offers (I don't think you can get a word's coordinates in a Word document from what I've seen) and it's slow.
... out of ideas ...
My goal is to get all the word's frames in a paragraph in the right order based on a string containing the text of the paragraph.
Thanks in advance for any hints!
As a starting place, you may want to take a look at QuickCursor's code. It retrieves text from many different applications through the AX Accessibility APIs. Now, it won't grab the pixel placement of the word, but it will at least return the NSString associated with the text in that UI element. Of course this means that the app in question has to support these APIs; I don't know if the MS Office suite would. In addition, it only supports editable elements, so an un-editable webpage in Safari won't work either. But it may give you a starting point for some ideas.
Take a look at the QCUIElement.{m,h}, and then the implementation in the QCAppDelegate.m (beginQuickCursorEdit:)... the implementation of his abstracted QCUIElement seems to be as simple as:
QCUIElement *focusedElement = [QCUIElement focusedElement];
id value = focusedElement.value;
Edit: Aha! Check out the Accessibility Inspector Sample code: UIElementInspector. It can actually get the AXPosition of elements on a page. Now, it's not word-by-word, but we're getting closer. It'll tell you the x,y placement of a textblock, as well as the words contained in the textblock.
This is possible, but very hard to get working reliably. You can play with Spell Catcher's Direct Connect feature to see an example.

Automated Development of Presentation with Interactivity

I am trying to identify the right tool, language, software package, or other for the automated development of presentations, where the presentation is user interactive.
The presentation will consist of images with titles and some descriptive text. Most of the time there will be 35–70 images. I would like to show each image on a separate page, slide, tab, etc. (I guess proper terminology depends on the solution.)
The images will change, but the titles will remain the same, and there will be a little bit of change to the description of each image.
After putting the presentation together, I would like the user to be able to circle and "write" on the electronic image in kind of the wax pencil sense (I previously worked in a photo lab and we worked with wax pencils on negatives all the time and would like to have kind of a similar flexibility). Moreover, I would like users to be able to add comments as well, kind of in the way Adobe PDF Professional allows, e.g. inserting bubble comments, etc.
Most importantly, I would like to be able to do this in an automated way. Right now we are using PowerPoint, but the amount of time it is taking to put an image on a slide in PowerPoint, resize it, and then set up the text is killing us. Plus, as the images change it takes tons of time to go back and update them. Thus, we would like something that is a bit faster to update images and get the feedback from our few users. Does not necessarily have to be a web hosted solution, but could be run through a browser.
Sorry this is so long and thanks for any ideas and feedback, especially if there is an existing software package solution, language that can be used, or other approach to get this done.
These days, two of the most popular are Adobe Captivate and Articulate Presenter. For service, instead of product, you can check out services like http://voicethread.com.
I don't know of any product that completely answers your requirements.
But, for similar results I use two different tools for developing the presentations and another one for drawing while presenting.
If I just want to make a presentation made of pictures and texts, and I want to automate its creation, I use irfanview http://www.irfanview.com/ with its wonderful feature for automated slideshows. I put all the images together, annotate them (I use either their filenames, or if not enought, with EXIF and comment fields) and create a slideshow, that can be compiler into an .exe file.
If I want a more elaborated presentation. With full annotation capabilities, I use Wink http://www.debugmode.com/wink/
For drawing over the screen during the presentation, I use a very old bitmap drawing program, called PC-Draw, that allows, with a hotkey, to capture the screen as a bitmap and begin drawing over it, and with another hotkey, to return to the original screen without altering the running programs at all. I have not found it anywhere in the web. However, I found similar programs just a quick google away.
All three tools are free and easy (and even fun) to use.