There are many GUI automation tools that allow clicking on a specified image (well-known Sikuli, for example). Is there any way to click on the specified text, not image? This way the tool will:
make screenshot
recognize text on it
find text position (somehow)
send click event to this position
It would be much easier to write tests using this approach (many interfaces have text button, inputs etc.) rather than make screenshots for every single element.
I've seen some OCR feature in Sikuli but it didn't work for me (I tried invoking click('some-text-here').
Sikuli built-in OCR features are pretty buggy and unstable. All (or at least most of) the related issues are listed in this BUG. However there are few possible workarounds which are, however, not also always applicable..
If the text is known, you can take a screenshot of the text and then look for it as a screenshot. For example if you know exact font of this text, you can automatically generate such text on the screen and use it as a pattern to locate it elsewhere.
The built-in tesseract based OCR, performs significantly better when the font is bigger, "fatter" and in Grayscale (usually). Hence you might do some background image processing before attempting the actual recognition. I used ImageMagick to resize and filter the images for better recognition. It can be in the background as a command line tool. For example:
convert -filter spline -resize 100x -unsharp 10x20 -type Grayscale
I am aware that this does not answer your question directly but these are steps you might consider taking towards the final solution.
I'm a developer at Deskover company and we are currently developing an application, UiPath Studio that meets your needs.
We provide text recognition on various technologies with 100% accuracy, ability to find specific text in an area on screen, a control or an entire window, and also ability to click text or controls.
You can execute different actions, sequentially by creating workflows.
We at Deskover are big fans of Sikuli project. We actually use the same image recognition engine in UiPath Studio.
UiPath Studio is a visual tool that helps you create workflows easily, but you can also use the underlying API and implement an application that extracts text and clicks on it.
You can find more details about the UiPath library here.
Related
I am attempting to create a script for Adobe After Effects. Part of what I am attempting to accomplish will require converting layers imported from Illustrator into After Effects shape layers.
I am having trouble finding any info on how this can be accomplished in ExtendScript. Is it possible?
Any menu commands in After effects are available to extendscript, even if they're not included in the API. To invoke a command as you would with a menu you use
app.executeCommand(1234);
Where 1234 is the number of the command you want. To find this magic number there is a function
app.findMenuCommandId("Full text of command as it appears in the menu");
It's kludgy, and there's no guarantee that adobe will stay consistent with the numbers between releases, but it's all we have. More details and a list of magic numbers here
I'm looking to automate software on Windows 2008. The automation software doesn't have to be Windows 2008 compatible (I can use remote desktop).
The GUI has two main areas, a list of embedded images on the left, and a display pane on the right. The display pane shows where all the embedded images have been placed on the screen (the program is used for building Human Machine Interfaces [HMI's]).
I need to click each of the embedded images in the list on the left and extract some data from them. The problem is; depending on main display file chosen, the list of embedded images will have different names and be of different lengths.
The automated task therefore changes depending on main display image file opened. Is there an automation program that can be customized for this? I could write separate scripts for each main display file but this defeats the purpose of automating. I looked into Sikuli, AutoIt, pywinauto and others, but have not found examples of what I'm trying to accomplish.
AutoHotkey can do what you're asking for with little difficulty.
You can use some basic OOP principles to write one program that has different clicking locations etc. based on which display file you're running.
I have built an application that automates the filling out of form fields inside a pdf. It then takes various assets and combines them together to generate a "print ready" product. All of this is accomplished using the magic of iTextSharp. When form fields are populated, they are then flattened to text. The problem is that even with the fonts embedded they aren't really attached to the form fields in a meaningful way (like straight text elements are) and the printers are complaining that the pdf is generating licensing errors due to this. I researched this a bit and it just seems to be the nature of how form fields are.
The artists we are working with requested that we research a way to "outline" the text that is created from flattening the form fields. I found that running the PDF Optimizer with a custom preset allows for Text Outlining in Acrobat, and even better I can generate an Acrobat Sequence that runs this command on the pdf. The problem is that Sequences can not be automated, at all.
I found a plug-in called AutoBatch that allows for the execution of Sequences on the command line through a batch file. The downside is that this would require installing Acrobat Pro and the Plug-in on the server this application will be running on. Further it seems like an overkill solution just to outline the text in the pdf. For all I know at this point iTextSharp may allow me to do this programmatic, but searching for such a thing on google returns little results and nothing relevant.
So the question: Is there a better way to outline text in a pdf than the current solution I have implemented or am I kind of stuck?
TLDR; PDF is generated w/ non-standard fonts. I need to "outline" this text to send it to the printer. Currently using AutoBatch Acrobat Plug-In to execute Acrobat Sequence from the Command Line. Seems excessive, wondering if anyone knows a better way to automate font outlining.
I am also in a printing environment and have used forms for "Box Covers" plenty of times to shorten the code used to produce box covers.
I simple us "pdfStamper.FormFlattening = true;" and the printers (Xerox DP180 and DC5000) has no problems in using the PDF.
The moment I leave out FormFlattening the printer gives a lot of errors regarding the PDF.
If you are using FormFlattening then check if the printer has the font locally installed in order for it to reference the font from the print engine instead of the PDF resources.
Here's a tough one:
I need to be able to find a word's position and size (its frame) on the screen (its first occurence is enough, from there I should be able to get the next ones).
For example, I would like to be able to detect word positions in (but not limited to) Word, Excel and PowerPoint for Mac, as well as Safari and others.
The solution should be as fast as possible; I should be able to find at least 5-6 words per second and use as little CPU time as possible.
Here's what I thought of so far:
OCR in a window's screenshot / graphics context (any good Open Source framework that works on Mac OS X 10.4 and that can be used in a commercial product?). Evernote is very good at spotting words in images. I don't know if it uses a custom in-house engine or an Open Source / commercial one but that would be the kind of engine I would like to use if this is a "valid" solution. Ideally I would detect the word's frame in the active application's window (how to get the frame of another application?).
Getting some kind of "hook" on Quartz drawing of text and intercepting the location of the word when it's drawn (does not seem very feasible at first glance!).
AppleScript, but it depends a lot on what API the application offers (I don't think you can get a word's coordinates in a Word document from what I've seen) and it's slow.
... out of ideas ...
My goal is to get all the word's frames in a paragraph in the right order based on a string containing the text of the paragraph.
Thanks in advance for any hints!
As a starting place, you may want to take a look at QuickCursor's code. It retrieves text from many different applications through the AX Accessibility APIs. Now, it won't grab the pixel placement of the word, but it will at least return the NSString associated with the text in that UI element. Of course this means that the app in question has to support these APIs; I don't know if the MS Office suite would. In addition, it only supports editable elements, so an un-editable webpage in Safari won't work either. But it may give you a starting point for some ideas.
Take a look at the QCUIElement.{m,h}, and then the implementation in the QCAppDelegate.m (beginQuickCursorEdit:)... the implementation of his abstracted QCUIElement seems to be as simple as:
QCUIElement *focusedElement = [QCUIElement focusedElement];
id value = focusedElement.value;
Edit: Aha! Check out the Accessibility Inspector Sample code: UIElementInspector. It can actually get the AXPosition of elements on a page. Now, it's not word-by-word, but we're getting closer. It'll tell you the x,y placement of a textblock, as well as the words contained in the textblock.
This is possible, but very hard to get working reliably. You can play with Spell Catcher's Direct Connect feature to see an example.
I am trying to identify the right tool, language, software package, or other for the automated development of presentations, where the presentation is user interactive.
The presentation will consist of images with titles and some descriptive text. Most of the time there will be 35–70 images. I would like to show each image on a separate page, slide, tab, etc. (I guess proper terminology depends on the solution.)
The images will change, but the titles will remain the same, and there will be a little bit of change to the description of each image.
After putting the presentation together, I would like the user to be able to circle and "write" on the electronic image in kind of the wax pencil sense (I previously worked in a photo lab and we worked with wax pencils on negatives all the time and would like to have kind of a similar flexibility). Moreover, I would like users to be able to add comments as well, kind of in the way Adobe PDF Professional allows, e.g. inserting bubble comments, etc.
Most importantly, I would like to be able to do this in an automated way. Right now we are using PowerPoint, but the amount of time it is taking to put an image on a slide in PowerPoint, resize it, and then set up the text is killing us. Plus, as the images change it takes tons of time to go back and update them. Thus, we would like something that is a bit faster to update images and get the feedback from our few users. Does not necessarily have to be a web hosted solution, but could be run through a browser.
Sorry this is so long and thanks for any ideas and feedback, especially if there is an existing software package solution, language that can be used, or other approach to get this done.
These days, two of the most popular are Adobe Captivate and Articulate Presenter. For service, instead of product, you can check out services like http://voicethread.com.
I don't know of any product that completely answers your requirements.
But, for similar results I use two different tools for developing the presentations and another one for drawing while presenting.
If I just want to make a presentation made of pictures and texts, and I want to automate its creation, I use irfanview http://www.irfanview.com/ with its wonderful feature for automated slideshows. I put all the images together, annotate them (I use either their filenames, or if not enought, with EXIF and comment fields) and create a slideshow, that can be compiler into an .exe file.
If I want a more elaborated presentation. With full annotation capabilities, I use Wink http://www.debugmode.com/wink/
For drawing over the screen during the presentation, I use a very old bitmap drawing program, called PC-Draw, that allows, with a hotkey, to capture the screen as a bitmap and begin drawing over it, and with another hotkey, to return to the original screen without altering the running programs at all. I have not found it anywhere in the web. However, I found similar programs just a quick google away.
All three tools are free and easy (and even fun) to use.