Find string in PDF/a documents

Find string in PDF/a documents - pdf

I'm scaning several documents and save as PDF/a in directory. Is it possible and how (which software) search string inside documents.
For example in one of documents I have string stackoverflow
After type stackoverflow I want focus on this documents.

Try this;
Convert pdf in searchable pdf eg. (PDF Converter Professional 8)
Put all document in one folder
Open foxit reader in right corner click on small directory icon and set search location and enter your string.
Enjoy!

Related

Converted PDF from Word document and wrong title is being displayed

I used a Microsoft Word template to make a few documents, and I then converted those documents into PDFs by Saving as PDF. I'm displaying the PDFs in a web page but all the PDFs have the same title on the web page. I cannot find anywhere where there is a "title" for a Word document and I'm nearly at my wits end.

I figured it out. I was looking at Word instead of the PDF itself. I can change the title funder File > Properties in Adobe Acrobat Pro DC.

how do I extract the Arabic text of this PDF file correctly?

Today i tried to search a Arabic word in a PDF file that contained Arabic content.
All PDF reader soft wares cannot search any Arabic word in this PDF file.
So I dragged PDF file into Firefox browser and selected a area that contained some words by inspect elements and saw this:
hw ½oiC instead of آخرین سخن
What is type of the encoding used in this PDF file?
how can i encode this to normal text?

It's difficult to comment on the file you are looking at without seeing it but a good starting point is to try Acrobat and by either copying the text and pasting it into a text editor or doing a search for the text content will reveal if it can be extracted correctly or not.
If it can't be extracted properly then there's a good chance the font is lacking a ToUnicode entry (see Section 9.10.1 of the ISO PDF 32000-1:2008 specification for more information).

Print to pdf that is searchable and selectable from existing pdf that is selectable and searchable

I am trying to print a section of an existing pdf to a new pdf. The original is searchable and selectable but the new pdf cannot do either. I am using "adobe acrobat reader DC" and print via "Microsoft Print to PDF". Unsure if there is any other relevant information.

After searching for a period of time I could not find an answer that allows for direct PDF to PDF print.
I did find a workaround however.
I downloaded a free software called PrimoPDF. Once installed, PrimoPDF becomes a printer option within Adobe acrobat reader. I then selected my desired pages and printed to PrimoPDf instead of Microsoft Print to PDF. This Generated a .ps file. I then imported the .ps file into PrimoPDF application and was able to generate a .pdf from that. The newly generated pdf was searchable and selectable and exactly what I needed.
Hopefully someone else finds this useful in the future.

Generally refrying (printing to PostScript then converting back to PDF) is a bad idea. The reason that Microsoft Print to PDF created a file that wasn't searchable is because when Adobe Reader detects that the printer it is targeting isn't capable of rendering the PDF correctly because of any number of reasons, like it doesn't have the right fonts for example, it will render the PDF itself and send an image to the printer. A simpler PDF probably would have worked just fine.
You are much better off getting a tool that will simply allow you to extract the pages you need to a new file rather than printing.

Fix PDF with unreadable characters

Example PDF page: https://db.tt/qRcF000k
This is sample page from a document, where copied text shows as question marks in my favorite reader SumatraPDF (mupdf) just the same as in Adobe Acrobat. But my main problem is that I can not search this document because of this, nor I can index it.
OTOH, xpdf's pdftotext extracts correct text.
In Adobe Acrobat if I use "Copy as formatted text", correct text is written to clipboard, although I still can't search from Acrobat.
Also if I open the linked page in Firefox's built-in PDF reader I can correctly copy the text.
Can GhostScript perhaps be instructed to correct this issue, which I can not describe differently then as 'unreadable characters'?

The PDF file uses subset fonts with non-standard Encodings and no ToUnicode CMaps. So no, you can't have Ghostscript 'correct' this file.
In fact I can't see how anything can possibly be extracting sensible text from this, and indeed my version of Acrobat (Pro X and Reader XI) can't copy meaningful text and don't appear to have a 'copy as formatted text' menu item, can you tell me where to find this ?
However, I notice that the PDF file has actually been created by Ghostscript (version 9.14) so possibly you mean 'starting with a different input file, which I haven't given you, could I have generated a PDF file where the text could be copied', to which I can only say 'I don't know', it depends what was in the original input file .

Show MigraDoc/PdfSharp Document on screen

I want to use MigraDoc/PdfSharp to create and store PDF documents.
Is there a way to show these documents in an application on-screen? I'd like to show the print in my program rather than starting Acrobat Reader with the document name.
I considered storing the print using XPS instead of PDF, but then I'd need to way to convert XPS to PDF for mailing it to customers. And I don't want to save the same print in two formats for space reasons.

MigraDoc can save files in its own format "MigraDoc DDL". You can preview MDDDL on the screen, create PDF or RTF from it or print it.
Disadvantage: images are not included in the MDDDL file (OTOH this can be an advantage as images can be shared between several documents).
You can ZIP document plus images for storage.
PDFsharp can create PDF files from XPS (but this is in a beta state and not fully operational).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Find string in PDF/a documents - pdf

I'm scaning several documents and save as PDF/a in directory. Is it possible and how (which software) search string inside documents. For example in one of documents I have string stackoverflow After type stackoverflow I want focus on this documents.

Try this; Convert pdf in searchable pdf eg. (PDF Converter Professional 8) Put all document in one folder Open foxit reader in right corner click on small directory icon and set search location and enter your string. Enjoy!

Related

Converted PDF from Word document and wrong title is being displayed

how do I extract the Arabic text of this PDF file correctly?

Print to pdf that is searchable and selectable from existing pdf that is selectable and searchable

Fix PDF with unreadable characters

Show MigraDoc/PdfSharp Document on screen

Categories

Resources