Search and highlight a pdf within webbrowser in vb.net - vb.net

I'm using a WebBrowser-Controll in vb.net for showing pdf-files, simply by doing
WebBrowser1.Navigate(output_filepath & "#view=Fit")
Acrobat Standard X and XI is used as well as IE11.
Now I'm trying to implement a search functionality for searching the pdf-files and, in case the search-term is found, focus on and highlight it. I did not realiy find a good hint. Is there a way? Which one?
Is more information needed?
I'm keen to read your hints...!
EDIT:
I want to search for words in a searchable pdf-file shown in the webbrowser-command.

Related

Creating a pdf using Word VBA and Adobe Acrobat

I am trying to write a macro that creates a pdf of a Word document.
The code I have been using to do this uses ActiveDocument.ExportAsFixedFormat. This works up to a point, but it tends to fail when creating a pdf of a large document (and some of the documents I'll be processing with this macro run to thousands of pages).
My understanding is that the ExportAsFixedFormat method uses Word's built-in PDF creation methods. What I really want to do is use Adobe Acrobat to do the PDF conversion. If I do that manually by clicking on the export as pdf buttons within Word (I have Adobe Acrobat installed on my machine) then everything is fine. It uses the actual Adobe Acrobat PDF conversion, and my PDF gets created without errors even on documents large enough to cause the ExportAsFixedFormat method to fail.
I've been trying to figure out how to automate the conversion to PDF from VBA using Acrobat, and banging my head against a brick wall.
I discovered the CreatePDFEx method, which in theory looks like it should do what I want, but I also discovered warnings that this is not a supported method and is not recommended. See here:
https://forums.adobe.com/thread/286431
And indeed when I tried it, it didn't work.
I then discovered the AcroExch.AVDoc and AcroExch.PDDoc objects, which looked like it might be another way. See, for example, here:
https://forums.adobe.com/thread/301714
That also came with a warning that it wasn't supported. When I tried it, it worked, but it was painfully slow, even with documents of just a few pages. I hate to think what it would be like with a 5000 page document.
Is it actually possible to do this? It doesn't seem like it should be rocket science, but I am failing to find anything that works.
All I want to be able to do is to reliably create pdfs from large Word docs. I think that probably the way to do that is to figure out how to use Adobe's tools via VBA (is there a supported method?), but I'd be perfectly happy with the built in Word method if I could solve the problem of it failing with large documents.
Many thanks for any help.
Edit: I should also have mentioned that I need my Word headings to end up as PDF bookmarks. The ExportAsFixedFormat method does that, but some other methods don't.
You can use the SaveAs or SaveAs2 method to save as a PDF instead of ExportAsFixedFormat. For example:
ActiveDocument.SaveAs FileName:="Filename.pdf", FileFormat:=wdFormatPDF
On a bit more playing around, I did come up with one method which is an official supported method and uses the Adobe PDF creator:
ActivePrinter = "Adobe PDF"
ActiveDocument.PrintOut
The problem with that is that it doesn't turn my Word headings into PDF bookmarks.
Does anyone know if it's possible to set some options in that code so that it does?

How do I style a word document exported from a webpage in VB.Net

I'm trying to export text retrieved from a database into a word document in VB.Net and while I have a working example, I need to figure out how to style some sections of the document appropriately.
I have found a few working examples from MS Online resources (such as this one), which I've found can cover some basics:
para.Range.Text = "Quad Chart"
para.Range.Style = "Heading 1"
para.Range.Bold.Font = True
But it doesn't cover even some of the simplest of formatting such as:
How you align the text (left, right, center)?
How you specify letting?
How do you start a list style?
What I'm trying to find is either a straight answer to these or (even better), a definitive list of the commands that would allow most any formatting.
Also, I would prefer not using Spire, which seems to be a common answer.
Thanks!
The VBA object model describes all the classes, their methods and properties that you can use for the marking up of content.
Your suggestion to use styles is strongly recommended as a way of separating your code from the presentation. Create a document template (.dot or .dotx, depending on Word version) and attach this to your documents. Then, when the document is opened, it will inherit layout and presentation from the template and be correctly rendered.
The list creation is a little intricate as you will need to restart the list if you are using numbering.
If you are interested in a completely different approach, you can look at Applying an XSLT Transform in the Microsoft Office Word 2003 XML Software Development Kit. This describes how to generate XML documents and using XSL transforms to describe the presentation. More general, but definitely more complex to set up.
Your preferred approach will depend on whether you want to generate native documents with a template, or to require your users to install the transform using the tools in the SDK.
So, you have a few examples. Office VBA is a cut down version of VB6, so why not record some macros in Word, open the VB editor and look at what it does. It's also the easiest way to navigate the help on the Word object model.

Extract screenshot or picture of portion of PDF using VBA or VB and Adobe SDK

I am currently using an excel macro (although I will switch to VB.NET if necessary) to loop through all of the text in a PDF and populate an array with certain portions of the text in the PDF (via the Adobe SDK and getPageNthWord). This part is working just fine, but now what I want goes a step further.
There are certain portions of the PDF where just grabbing the text isn't giving the full picture, and I'd like to see what more I can get. This is exactly the screenshot or snippet I am trying to get:
So, I know that I could use getPageNthWordQuads to find the coordinates for the words "Compliance Warning" and I could figure out a way to find the bottom right of the screen as well, but my problem starts there. After I get those coordinates what would I do with them? Can I zoom in the PDF to only see that portion and then take a screenshot? I already have the code for a screenshot of the activewindow, but I don't know how to scroll or zoom on a PDF.
Any help would be greatly appreciated. A fresh approach would be welcome as well. Thanks!
There are probably a number of approaches that would work - I don't know enough about your environment / constraints to know for sure which would work best. I'm assuming you are talking to Acrobat through OLE here.
1) You can open a window, get its AVPageView and ask it to zoom and move to where you want it to do your thing.
2) You can open a PDF document in one of your own windows using OpenInWindowEx and then grab the contents of that window (the advantage being that this window could be off screen).
3) You can use the DrawEx method (in AcroExch.PDPage) to render a specific portion of a page into your own window and then process that.

Create slideshow from text

I would like to read text from 3 text paragraphs & use the text to create a slideshow of 3 slides - each with a paragraph programatically.
Is it possible? Do I need to use openoffice, libreoffice or something else?
I have googled a lot, but could not find any answer. Hence, posting the question on SO.
Thanks.
Openoffice has a bridge called Uno which can be used from python, java (and probably more), i.e. you can manipulate openoffice documents from an external program, it's however non-trivial to use it.
Another possibility is to use OpenOffice.org BASIC.

PDF Outline Text - Automation of Acrobat Sequences

I have built an application that automates the filling out of form fields inside a pdf. It then takes various assets and combines them together to generate a "print ready" product. All of this is accomplished using the magic of iTextSharp. When form fields are populated, they are then flattened to text. The problem is that even with the fonts embedded they aren't really attached to the form fields in a meaningful way (like straight text elements are) and the printers are complaining that the pdf is generating licensing errors due to this. I researched this a bit and it just seems to be the nature of how form fields are.
The artists we are working with requested that we research a way to "outline" the text that is created from flattening the form fields. I found that running the PDF Optimizer with a custom preset allows for Text Outlining in Acrobat, and even better I can generate an Acrobat Sequence that runs this command on the pdf. The problem is that Sequences can not be automated, at all.
I found a plug-in called AutoBatch that allows for the execution of Sequences on the command line through a batch file. The downside is that this would require installing Acrobat Pro and the Plug-in on the server this application will be running on. Further it seems like an overkill solution just to outline the text in the pdf. For all I know at this point iTextSharp may allow me to do this programmatic, but searching for such a thing on google returns little results and nothing relevant.
So the question: Is there a better way to outline text in a pdf than the current solution I have implemented or am I kind of stuck?
TLDR; PDF is generated w/ non-standard fonts. I need to "outline" this text to send it to the printer. Currently using AutoBatch Acrobat Plug-In to execute Acrobat Sequence from the Command Line. Seems excessive, wondering if anyone knows a better way to automate font outlining.
I am also in a printing environment and have used forms for "Box Covers" plenty of times to shorten the code used to produce box covers.
I simple us "pdfStamper.FormFlattening = true;" and the printers (Xerox DP180 and DC5000) has no problems in using the PDF.
The moment I leave out FormFlattening the printer gives a lot of errors regarding the PDF.
If you are using FormFlattening then check if the printer has the font locally installed in order for it to reference the font from the print engine instead of the PDF resources.