Exporting embedded Adobe PDF Reader text - pdf

I have an embedded Adobe PDF Reader in my Windows application. When I open a certain PDF file I need to do is manually select a text in that PDF and transfer it over to a textbox. I haven't done much work with PDF embedded components. But I can see two potential solutions. Either to find where in embedded component selected text can be grabbed from or use a Clipboard to cut selected text and transfer it over to a textbox.
Can anyone help me with this? So to put it plainly I want to know how the best way to access text (selected or not) in embedded PDF Reader Component.

Related

Is there a recommended workflow to automate producing PDF forms with embedded javascript?

We use a lot of PDF forms with embedded javascript. We generate PDFs from LibreOffice, then use Acrobat to add PDF controls and javascript. This isn't working well, because a change to the appearance of the form in LibreOffice then causes additional work in Acrobat to put the PDF controls back where they should be, and then re-do the javascript.
Is there a smart way to generate PDFs with the PDF controls built-in (text input boxes, check boxes, radio boxes, digital signature boxes), with all the javascript included in the source file?
For example, is there a tool that could convert an html form with embedded javascript into a PDF with the same javascript running in the PDF?
I have a two step process.
First, create the PDF using Adobe InDesign. InDesign can add PDF interactive PDF fields to your document so that when exported, the fields are present.
The second step is to use a script to add scripted actions to each field like this...
this.getField("foo").setAction("MouseUp", "app.beep(0);");
See the documentation for more actions.

Print to pdf that is searchable and selectable from existing pdf that is selectable and searchable

I am trying to print a section of an existing pdf to a new pdf. The original is searchable and selectable but the new pdf cannot do either. I am using "adobe acrobat reader DC" and print via "Microsoft Print to PDF". Unsure if there is any other relevant information.
After searching for a period of time I could not find an answer that allows for direct PDF to PDF print.
I did find a workaround however.
I downloaded a free software called PrimoPDF. Once installed, PrimoPDF becomes a printer option within Adobe acrobat reader. I then selected my desired pages and printed to PrimoPDf instead of Microsoft Print to PDF. This Generated a .ps file. I then imported the .ps file into PrimoPDF application and was able to generate a .pdf from that. The newly generated pdf was searchable and selectable and exactly what I needed.
Hopefully someone else finds this useful in the future.
Generally refrying (printing to PostScript then converting back to PDF) is a bad idea. The reason that Microsoft Print to PDF created a file that wasn't searchable is because when Adobe Reader detects that the printer it is targeting isn't capable of rendering the PDF correctly because of any number of reasons, like it doesn't have the right fonts for example, it will render the PDF itself and send an image to the printer. A simpler PDF probably would have worked just fine.
You are much better off getting a tool that will simply allow you to extract the pages you need to a new file rather than printing.

PDF cannot display Chinese fonts in table of contents

I made a PDF file from Latex (using TexMaker).
Acrobat Reader is able to display BOTH the text and the table of contents in Linux.
But Acrobat Reader is unable to display the table of contents in Windows XP (the Chinese characters came out as boxes). However, the text is displayed correctly.
I tried to embed the fonts into the PDF but the various methods are not 100% successful, so I'm not sure if the fonts are embedded correctly or not. Anyway, the table of contents remain unreadable in Windows.
I wonder if it is really an font embedding problem? Or do I need to install these "Adobe Reader X Font Packs":
https://www.adobe.com/support/downloads/detail.jsp?ftpID=4883
My concern is that I'd like my PDF to be readable in Windows, including the table of contents (and preferably without further installations). If this is possible...
I suspect you are talking about "bookmarks" and not saying part of the text in the document is ok and part is not. PDF Bookmarks are part of the UI of the application and are not selected from embedded fonts. Therefore, the system you are running on needs to know how to handle fonts in the language(s) of choice.
See https://forums.adobe.com/thread/1144972?start=0&tstart=0
Embedding the fonts will have no effect on the bookmarks.

How can I edit the search text of a searchable PDF?

I have access to a scanner at my library which can create "searchable PDFs." These are PDFs that show the exact image of a scanned document, but there is a kind of hidden text in the PDF that can be selected when you try to select a portion of the image that contains text. In this way you can copy and paste text or search for text in the scanned document. This is VERY useful. It's an awesome improvement over raw scanned images. I also have several apps on my mac that can create this kind of searchable PDF from a scanned document or a raw image.
Now it's obvious from any who has ever used OCR that the process of converting images to text is not 100% accurate, so the text that you search or copy will not be correct in some places.
So I search for quite some time to find an application that would load a searchable PDF and allow me to repair the hidden searchable text without reformatting or modifying the original scanned image.
Does anyone know of a tool (or library API) that would allow this?
It's worth saying here that I tried the latest version of Adobe Acrobat DC for Mac, and it doesn't seem to even allow me to view the hidden searchable text, much less edit it. It does allow me to replace scanned image with the results of it's own OCR process so that I could edit and save the document. But this would produce horrible results for any of the scanned documents that I am using. It seems designed for editing a "native PDF" not editing a scanned document.
I have also tried ABBYY FineReader with no luck.
i'm using ABBYY FineReader 12 Professional. (not open source)
Just open a scanned image or scanned pdf and press Verify Text(or Ctrl + F7), than you go over all the spelling errors or low-confidence charachters and fix them.
The program is very good, it shows you the exact place in image/pdf to correct and the OCR guessing side by side for convenience. It iterates all of them.
[By the way, I'm using the shortcuts to speed up things:
Alt+Enter to add the unrecognized word to dictionary.
Ctrl+Delete to skip word or confirm in case you fixed it.]
Than save the document as a pdf file Menu:File>Save Document As> PDF File, and you can search it on every pdf reader. The saved file look the same as the scanned one, but 'behind' it there text.
It's weird you tried ABBYY with no luck... it's working great for me. maybe you tried not the Professional version.
Hope it helps you.
It is not creating a searchable pdf from images the poster is after, he wants to start with an already searchable pdf and modify its text (e.g. because intially a searchable pdf was made but later an overlooked error in recognition was found and needs correction). I see no way and no tool that assists in doing this.

difference between microsoft report viewer and adobe pdf reader tools?

i would like to display a pdf on my winform and am thinking of using of those tools in my vb.net application. does anyone know the difference between the two?
Microsoft Report Viewer reads report definition files and displays the report. Adobe's PDF reader displays PDF files.
Report definition files != PDF files, so you would need to make sure that you use the right tool for the right job. If you need to read PDFs, use a PDF reader.
As for consuming a PDF on a WinForm, you could host a WebBrowser control and point to the PDF. Alternately, there are several WinForm control manufacturers that read and display a PDF file (though I've not used any of them so would not be able to recommend one over another). Examples would be:
http://www.tallcomponents.com/
http://www.skysof.com/