Extract screenshot or picture of portion of PDF using VBA or VB and Adobe SDK - vb.net

I am currently using an excel macro (although I will switch to VB.NET if necessary) to loop through all of the text in a PDF and populate an array with certain portions of the text in the PDF (via the Adobe SDK and getPageNthWord). This part is working just fine, but now what I want goes a step further.
There are certain portions of the PDF where just grabbing the text isn't giving the full picture, and I'd like to see what more I can get. This is exactly the screenshot or snippet I am trying to get:
So, I know that I could use getPageNthWordQuads to find the coordinates for the words "Compliance Warning" and I could figure out a way to find the bottom right of the screen as well, but my problem starts there. After I get those coordinates what would I do with them? Can I zoom in the PDF to only see that portion and then take a screenshot? I already have the code for a screenshot of the activewindow, but I don't know how to scroll or zoom on a PDF.
Any help would be greatly appreciated. A fresh approach would be welcome as well. Thanks!

There are probably a number of approaches that would work - I don't know enough about your environment / constraints to know for sure which would work best. I'm assuming you are talking to Acrobat through OLE here.
1) You can open a window, get its AVPageView and ask it to zoom and move to where you want it to do your thing.
2) You can open a PDF document in one of your own windows using OpenInWindowEx and then grab the contents of that window (the advantage being that this window could be off screen).
3) You can use the DrawEx method (in AcroExch.PDPage) to render a specific portion of a page into your own window and then process that.

Related

Extract PDF coordinates using mouse click

I want to extract the coordinates of a PDF document with the help of a mouse click. I have gone through some posts but since I'm new to this, I'm not being able to understand it properly. Also, can this be done if I render the PDF file in a web page?
You can add javascript to a pdf document. Although you only get access to a limited subset of the language.
If you only need the coordinates once (for instance when doing layout of the document), you can simply open it with adobe and activate the rulers/grid option to see where your mousepointer is currently located.

Print preview of my web page to pdf and save it on server side programmatically

I have a deceitfully difficult problem which I had thought it was easy one at the beginning and yet I have spent more than 3 days on and off in total.
what I simply need to do is to save the print preview of the page to PDF file on server side from code behind initiating by a button click.
I was expecting using an open source and then I thought there would be a code like xyzopencode.savepasgeaspdf(path) but I could not find it. I got really close to solution by saving the PDF but then I realized it did not save the picture it only saved the strings.
I tried the pdfsharp but as long as I see it draws the whole thing from scratch and I am nor sure if I can do it.
The reason I need picture compatible one is I have 3rd party signature controller on my page and my couple of attempts worked without them or any picture but when I added pictures they failed to show to picture or did not create the PDF at all. The perfect solution would be just saving what ever shows up in the print preview as PDF, just like the built in feature of Google chrome (but on server side).

Photoshop jsx image grid

What I am ultimately trying to do is to create a grid of images for print that are minor variations of the same thing (different text is all). Looking through online resources I was able to create a script that changes the text and exports all of the images necessary (several hundred). What I am trying to do now is to import all of these images into a new photoshop document and lay them all out in a grid and I can't seem to find any examples of this.
Can anyone point me in the right direction to place a file at a specific coordinate (I'm using CS5 and have the design suite so if there is a way in illustrator to do this quickly...)?
Also, I'm open to other ideas on how to do this (even other programs) easily. It's for labels so the positioning on the sheet has to be pretty precise...
The art layer object has a translate() method that takes delta x and y params. You'll need to open each image, copy it to the target document, get its current location (using artLayer.bounds) and do the math to find the deltas to position it where you want it. Your deltas can be in pixels so you'll get plenty of precision.
Check out your 'JavaScript Scripting Reference' pdf in your Adobe install directory for more details.
Ok I'm marking Anna's response as the answer because though I didn't fully test it, it seems like it should work and answers the original question with jsx. However I'm also leaving my final solution in case anyone else runs across this with the same issue and may prefer this method as well.
What I ended up doing instead is using InDesign. I figured out that it has a grid option that lets you import a number of files and place them all in an equal grid in a single command. This is almost exactly what I was looking for, except that it leaves a small border/margin in between the columns and grids and mine were designed to meet exactly.
I couldn't figure out how to make it not have the border (I have very little experience with InDesign, it may be possible). However I was able to select all my images and scale them uniformly to be the correct size, then I just selected each column and dragged it over to snap to the adjacent column and the same with rows...

Edge Animate Automation

Most Adobe products have the ability to be automated using AppleScript or ExtendScript/JavaScript but I don't seem to see the same capabilities in Edge Animate. Maybe I'm just missing something. I'm looking to be able to do things like open the document, add images, save the document, etc. Has anyone been able to find anything like this? I've done a number of different searches to no avail.
I'm not exactly sure what you're asking, but I think you're talking about adding your own javascript, which can be done by clicking the curly brackets located next to any of the elements in the animation, or hittin ctrl+E to see the full code.
Second, in terms of opening the document, you should be able to just double click the an file that it creates, and saving the document is just like in any other program, file>save(as).
Adding images is as simple as file>Import (hotkey = ctrl + I).
not exactly what I was asking for, but I did end up finding a solution. I was looking for a mechanism to be able to control Edge Animate similar to how you can control InDesign, Photoshop, and Flash via VBScript, and JavaScript respectively. This allows you to do things like import images into your document from an external script, save the document, export contents, etc. In the end, I wrote some code that sends key-strokes to the application and that resolved the problem although not ideally, IMHO. Thanks for your responses, though.

Programatically extract content of PowerPoint slides into MS Word-like format?

I'd like to extract all of the information (formatted text, images, etc) from powerpoint slides into a flowing, readable (MS Word-style) format.
I'm not interested in keeping the slide concept at all--think of taking class slides from a college course and batch converting them all into one collective study guide.
I can't find a way to do this within powerpoint (though if you know of one, please share!) and,
I don't have experience scripting Office apps. Is this kind of thing easily done? Does this kind of script already exist somewhere?
Clarification:
In an earlier version of this post, I used the word "flowing" to refer to a slide-free (MS Word-like) format. This does not, however, refer to the actual formatting of slide content. So keeping bullet lists, etc. is fine and even desirable.
I don't see this being a simple task. College professors use a format of either "TITLE: BULLET POINTS OR IMAGE" or "EVERY WORD I'M ABOUT TO SAY" for their slides in my experience, and you're just not going to get flowing, readable text from the former no matter what you do. For the latter, you've already got your text, you just have to copy it to another document.
I think you might as well just open the PowerPoint, select all the text, and copy+paste into Word/Publisher/InDesign/your favorite page layout program. You'll have the same effect and the same amount of editing after the fact except without all the hassle of writing a program to do it for you.
Doing a Print operation to a PDF with the N-up options might be a good solution for handouts if that's all you need. You could expand the idea and condense ALL the slide decks into one, get it printed (with N slides per page and the note space next to it) and bound, and voila, instant study guide. I've seen that, and then you get options for note taking.
More power to you if you're doing this just because you can - don't let me stop you. There is much good learning to be had that way. You might want to look into writing a program using the Microsoft.Office.Interop namespace in .NET (starting at http://msdn.microsoft.com/en-us/library/bb772069.aspx ), or perhaps look on CPAN ( http://search.cpan.org/search?mode=all&query=powerpoint ) and do it with Perl! There are lots of ways to do it, but you've got to be up for the challenge.
Text is fairly simple to extract, but what text do you want? The text from the title and body text placeholders only? File, Save As, and choose to save the outline.
The other text on the slide? That can be pulled out to a text file programmatically, but in what order? Suppose you have a complex diagram with text callouts. Extracting the text is going to give you gibberish. There's no obvious/meaningful order to the text other than what the human viewer supplies by noting that "Ah. The arrow next to this bit of text points to the fribulator sub-assembly, so must relate to it in some way." Try doing that in code. ;-)
You could give the author a way to sort the text into reading order so that the code knows what order to extract it in, but that would require a fair amount of work on the part of the author.
If you can be certain that all of the content is in title+bullet form, no worries. Otherwise, you'd have to be able to articulate exactly what you want extracted, in what form and in what order before you could get anywhere with this.
MS Word-style is not only readable, but writeable as well (which was not specified in your requirements). If you want a read-only guide, PDF is your natural choice (either through Acrobat Distiller or LibreOffice). Combine individual Acrobatted presentations with PDFtk, or Acrobat or Foxit and you're good to go without any programming at all.
"Is this kind of thing easily done?" - Yes, your humble servant did a couple of similar scripts ages ago (extracting enhanced metafiles from Powerpoint slides).
"Does this kind of script already exist somewhere?" - Yes. Probably at hundreds of places, but not sure if any of them get posted to the 'Net. All things considered think you'd be better off learning some scripting and macro programming on your own, since a ready-made script may be not quite fit for your needs - and to understand and rewrite it you'd need more time than to code & debug from scratch.
Since you mention that title+bullet form is ok, open the file, choose to save as and pick Outline as the save-as type.
I think you could parse through the PowerPoint file for formatting, text and pictures. There are Visual Studio namespaces available for such a task. You open the file, parse through it and make Word file from these. Complicated work, as you would have to consider type of elements and their position, you would have to use a temporary structure for each slide.
Have a look at this sample code :
http://msdn.microsoft.com/en-us/library/office/gg278331.aspx
How to: Get All the Text in All Slides in a Presentation
Basically, using c# and openXML SDK 2.0, it loops through all the slides in the presentation, and then adds each text in every slide into a string builder. You can write out the result into a text file if you like (modification required).
Recommendation: <25 oct 2012>
For your study guide, maybe you could extract all the text in each slide, and dump those text programmatically (by adding that function into the sample code above while it's iterating the slides) into the "Notes" section of each slide. With that, you can print it in Notes Page view. You'll get the entire slide image at the top half of the page, and the actual slide texts at the bottom of it in the Notes Page view. It sure beats trying to copy and paste all the text from the slide into the notes section. You can even print it 2 slides per page, as small text would not be an issue inside the slide's image, and diagrams would still be visible more or less.
Unfortunatly, this method works for simple standard slide format ... meaning, it's OK if your slides just have a title, and a center text box with all the bullet points... any complex slide layout (maybe text boxes scattered everywhere) will come out in non-order and will be confusing. But at least you can still look at the slide image above to make sense of it :)