Open pdf file in Microsoft Word using OLE - pdf

I am looking for the method (of Word ole-object) which can open pdf in the Microsoft Word.
I want to copy all pages of pdf into doc/docx and add there footers.
Could anybody give the cue how to import pdf?
PS: any sample code for this problem would be great.
Thanks,
Lilya

You need OCR (Optical Character Recognition) engine for converting PDF to document. PDF is generic format and it can include text as image. So it is very hard to convert PDF to document. SAP hasn't got any OCR function for doing this. Maybe OpenText (if customer using it) has this functionality, I haven't got detail information about opentext. You need third party tools for this. You can use online services or command line utilities to converting PDF files to text files easelly if PDF included text, otherwise you need professional SDKs (for example Abbyy Finereader) for doing this.

I used FoxIT PDF Reader to save the PDF file into text file and make a macro to read the text file. Of course, by doing so, you can only get the text, but nothing else.

Related

Print to pdf that is searchable and selectable from existing pdf that is selectable and searchable

I am trying to print a section of an existing pdf to a new pdf. The original is searchable and selectable but the new pdf cannot do either. I am using "adobe acrobat reader DC" and print via "Microsoft Print to PDF". Unsure if there is any other relevant information.
After searching for a period of time I could not find an answer that allows for direct PDF to PDF print.
I did find a workaround however.
I downloaded a free software called PrimoPDF. Once installed, PrimoPDF becomes a printer option within Adobe acrobat reader. I then selected my desired pages and printed to PrimoPDf instead of Microsoft Print to PDF. This Generated a .ps file. I then imported the .ps file into PrimoPDF application and was able to generate a .pdf from that. The newly generated pdf was searchable and selectable and exactly what I needed.
Hopefully someone else finds this useful in the future.
Generally refrying (printing to PostScript then converting back to PDF) is a bad idea. The reason that Microsoft Print to PDF created a file that wasn't searchable is because when Adobe Reader detects that the printer it is targeting isn't capable of rendering the PDF correctly because of any number of reasons, like it doesn't have the right fonts for example, it will render the PDF itself and send an image to the printer. A simpler PDF probably would have worked just fine.
You are much better off getting a tool that will simply allow you to extract the pages you need to a new file rather than printing.

How to convert marathi data from pdf to excel in proper format

I am converting Marathi data from PDF to excel or word but it is not getting proper format.
I have copied some data from PDF and pasted in word document but it was not getting proper format.
e.g. प्रविण सुधाकर शिरवाडकर this line is in PDF
but when i copied and pasted in word it has been getting
-प्रववर् सुधाकर शिरवाडकर
what should i do for this?
anyone please help me.
thank you in advance
There seem to be problems in the way PDF stores unicode devnagri text. Try this alternative route: convert your PDF to an image. Can use an online tool or downloaded, or if on linux use this command in terminal:
for f in *.pdf; do convert -density 200 "$f" "${f}_200dpi.jpg"; done
change the density from 200 to other as per need. Each page from your document should be converted into an image file. For a windows tool, try https://www.pdfill.com/pdf_tools_free.html
Then, go to http://www.i2ocr.com/free-online-hindi-ocr, upload the image and convert. That uses OCR (optical character recognition).
check the font in your PDF and try making it available to the word document.
I think you dont have perticular fonts which are used in PDF
In Adobe Reader -- -- File menu > Properties > Fonts tab gives you a list of all fonts used in the document.

how can I generate a summary of the highlights in a pdf file?

We all know that we can highlight certain texts in a pdf file either using Adobe Acrobat or Preview on Mac. I'm wondering how I can extract all these highlights in a pdf file, and generate a summary (a note kind of thing).
The following post
PDF: standard format for highlights?
points out that there are multiple ways to do highlighting. Will it be a challenge to distinguish the original content of the file and the user-added highlights if shapes with transparency is used to achieve highlights?
Details about this can be found in open source pdf parsing-rendering libraries, and you just have to read the code or document if available.

How to convert WORD docs with Bookmarks to PDF using GhostScript?

I'm converting WORD docs to PDF programmatically using vb.net and ghostscript. This word doc I’m having problems with has hyperlinks to external URLs and also hyperlinks to bookmarks within the document. When the doc is converted to PDF the external URLs work but the links to the bookmarks do not.
I have searched for a solution to get these bookmarks to work on the output PDF but haven’t had any luck. Hopefully someone has done this and can share the solution.
Ghostscript only handles PDF or PostScript as an input, there are sibling products to handle XPS and PCL as well but none of them handle Word .doc files. So you must be converting the Word file into something else.
I'll hazard a guess that you are using the Windows PostScript printer driver to convert to PostScript and passing that to GS (possibly via the RedMon Port Monitor) to convert into PDF.
Now PostScript doesn't support hyperlinks, bookmarks, or any of the other paraphernalia of a viewing application, since its intended as a print language. To overcome this Adobe introduced an extension, the pdfmark operator, which can be used to create this kind of information. NOTE this is an extension which is only supported for conversion to PDF.
So, in order to get these inserted, you need to create pdfmarks in the PostScript. If you are printing from Word, this means that you have to insert PostScript into the file when printing. There is a 'pass through' mechanism for this purpose.
So what you need to do is create the appropriate Visual Basic script in Word which inserts the relevant pdfmarks when the document is printed. This is how the Adobe plug-in for Word (which used to be called PDFMaker a long time ago) works.
Have a look at this tool.
It does maintain bookmarks and hyperlinks.
http://www.transcom.de/transcom/en/2004_pdf-t-maker.htm

Are their any software exist who support bookmark facelity for text file?

i have a big file in text format who i trying to read through Visual studio. i need only a feature that softare support Bookmark on specific line as well as PDF reader support last view page for pdf files.
When i bookmark in Visual studio and next time open the file i found that their is no boomark their.
so i have a question that a software who support text file and support bookmark for specific line in the project and many other feaure who make text reading better.
TRY Open Office.
http://www.brainbell.com/tutorials/ms-office/Word/Return_To_The_Last_Editing_Position_When_Opening_A_Document.htm
Why dont you convert the text file to a pdf file? On the mac you can do it for free. In windows you can download the bullzip free pdf printer to do it. Once its a pdf you can just use the pdf reader's boomarking ability.