Qpdf: Font changes after recompression - pdf

I uncompressed (qpdf --stream-data=uncompress) a pdf file. After that in Notepad I changed some flat text (that was uncompressed) and saved it.
After recompressing (qpdf --stream-data=compress) the file and trying to open it again in Adobe Acrobat (pro) I get an error about "can not take out AAAAA+Arial-BoldTM" font. A second message says: there is an error at reading a stream.
After clicking OK the rest is well displayed with the changed text but in different font and Bold not displayed and also an image missing.
Did I forget something or shoud I apply different options?
Please your help/comment.

Related

Font Not Displaying Properly in PDF

I am trying to save a pdf from illustrator and I have never had this issue, the font looks fine in illustrator, but when I save the pdf and open the pdf in a pdf viewer the "i" character now has a box beneath the text but the dot of the i stays there.
When viewed in illustrator:
When viewed in a PDF viewer:
I know that when the square shows up it means the font you are trying to use isn't there however the other characters appear fine, it just seems to be the I which is odd. The font passed verification (for reference it is Playfair Display
Does anyone know how to fix this or why this could be occurring? Am I exporting wrong(I've never had this issue before with exporting)?
Thanks in advance!
Update: I solved my question while writing it. The font that was installed was a variable font type (I downloaded it from Google), for some reason it doesn't seem to want to play nicely in a pdf (maybe I'm saving it incorrectly?). I deleted the variable font and installed the static versions of the font and now the issue has gone away.
I don't know too much about variable fonts but it seems like they are maybe a bit finicky?
Hope this can help others!

Text changed to graphics, still selectable in PDF?

I have this PDF ebook with selectable text - the handwriting - but there is no such font embedded and the letters are all different, so it's not actually a font. How is this possible?
I've worked with CorelDraw and Adobe Acrobat, but I can't understand how this works.
The left side of the picture shows the document properties, the right side shows a page of the PDF file and I selected the last 3 rows. I can copy and paste that to a text file, no problem. How was this achieved?
There are a few possibilities but the most likely is the text is being converted to outlines/paths or vectors. Some software such as Adobe InDesign and other print design apps allow you to 'flatten' a font based text into vector or paths, meaning the original font isn't required to be embedded or installed on the system. The original text data is however still present and able to be copied into a text field or word processor.

PDF cannot display Chinese fonts in table of contents

I made a PDF file from Latex (using TexMaker).
Acrobat Reader is able to display BOTH the text and the table of contents in Linux.
But Acrobat Reader is unable to display the table of contents in Windows XP (the Chinese characters came out as boxes). However, the text is displayed correctly.
I tried to embed the fonts into the PDF but the various methods are not 100% successful, so I'm not sure if the fonts are embedded correctly or not. Anyway, the table of contents remain unreadable in Windows.
I wonder if it is really an font embedding problem? Or do I need to install these "Adobe Reader X Font Packs":
https://www.adobe.com/support/downloads/detail.jsp?ftpID=4883
My concern is that I'd like my PDF to be readable in Windows, including the table of contents (and preferably without further installations). If this is possible...
I suspect you are talking about "bookmarks" and not saying part of the text in the document is ok and part is not. PDF Bookmarks are part of the UI of the application and are not selected from embedded fonts. Therefore, the system you are running on needs to know how to handle fonts in the language(s) of choice.
See https://forums.adobe.com/thread/1144972?start=0&tstart=0
Embedding the fonts will have no effect on the bookmarks.

How can I edit the search text of a searchable PDF?

I have access to a scanner at my library which can create "searchable PDFs." These are PDFs that show the exact image of a scanned document, but there is a kind of hidden text in the PDF that can be selected when you try to select a portion of the image that contains text. In this way you can copy and paste text or search for text in the scanned document. This is VERY useful. It's an awesome improvement over raw scanned images. I also have several apps on my mac that can create this kind of searchable PDF from a scanned document or a raw image.
Now it's obvious from any who has ever used OCR that the process of converting images to text is not 100% accurate, so the text that you search or copy will not be correct in some places.
So I search for quite some time to find an application that would load a searchable PDF and allow me to repair the hidden searchable text without reformatting or modifying the original scanned image.
Does anyone know of a tool (or library API) that would allow this?
It's worth saying here that I tried the latest version of Adobe Acrobat DC for Mac, and it doesn't seem to even allow me to view the hidden searchable text, much less edit it. It does allow me to replace scanned image with the results of it's own OCR process so that I could edit and save the document. But this would produce horrible results for any of the scanned documents that I am using. It seems designed for editing a "native PDF" not editing a scanned document.
I have also tried ABBYY FineReader with no luck.
i'm using ABBYY FineReader 12 Professional. (not open source)
Just open a scanned image or scanned pdf and press Verify Text(or Ctrl + F7), than you go over all the spelling errors or low-confidence charachters and fix them.
The program is very good, it shows you the exact place in image/pdf to correct and the OCR guessing side by side for convenience. It iterates all of them.
[By the way, I'm using the shortcuts to speed up things:
Alt+Enter to add the unrecognized word to dictionary.
Ctrl+Delete to skip word or confirm in case you fixed it.]
Than save the document as a pdf file Menu:File>Save Document As> PDF File, and you can search it on every pdf reader. The saved file look the same as the scanned one, but 'behind' it there text.
It's weird you tried ABBYY with no luck... it's working great for me. maybe you tried not the Professional version.
Hope it helps you.
It is not creating a searchable pdf from images the poster is after, he wants to start with an already searchable pdf and modify its text (e.g. because intially a searchable pdf was made but later an overlooked error in recognition was found and needs correction). I see no way and no tool that assists in doing this.

Fix PDF with unreadable characters

Example PDF page: https://db.tt/qRcF000k
This is sample page from a document, where copied text shows as question marks in my favorite reader SumatraPDF (mupdf) just the same as in Adobe Acrobat. But my main problem is that I can not search this document because of this, nor I can index it.
OTOH, xpdf's pdftotext extracts correct text.
In Adobe Acrobat if I use "Copy as formatted text", correct text is written to clipboard, although I still can't search from Acrobat.
Also if I open the linked page in Firefox's built-in PDF reader I can correctly copy the text.
Can GhostScript perhaps be instructed to correct this issue, which I can not describe differently then as 'unreadable characters'?
The PDF file uses subset fonts with non-standard Encodings and no ToUnicode CMaps. So no, you can't have Ghostscript 'correct' this file.
In fact I can't see how anything can possibly be extracting sensible text from this, and indeed my version of Acrobat (Pro X and Reader XI) can't copy meaningful text and don't appear to have a 'copy as formatted text' menu item, can you tell me where to find this ?
However, I notice that the PDF file has actually been created by Ghostscript (version 9.14) so possibly you mean 'starting with a different input file, which I haven't given you, could I have generated a PDF file where the text could be copied', to which I can only say 'I don't know', it depends what was in the original input file .