Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 years ago.
Improve this question
We used Adobe inDesign to design story books. We need both the PDF file as well as epub file. Since we all view in PDF during the process, the final clear product in PDF, when we export as epub file, it's huge. It all messed up the original design. What can we do?
Why did it happen?
I've worked on ONE project going from InDesign to ePub about two years ago - and you are right it is a mess. It didn't understand which local overrides to keep and practically every paragraph had style="localoverride1 localoverride2 substyle3 etc" in it. It was a mess to sort and clean up.
After that miserable experience we've found that it is better to view PDF and ePub as two separate products. Our workflow takes source XML and goes EITHER into InDesign OR through an XSLT to make an ePub. We no longer use InDesign to attempt to make ePubs - with an XSLT there is a LOT more control over the look and feel of the final product.
However if you are dead set on using InDesign - I've heard that it does fixed layout "epub" fairly well (basically it ends up being a bunch of images - it's not reflowable).
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 years ago.
Improve this question
When i save a .psd to a Photoshop PDF, some fonts don't look like they looked in Photoshop.
They are not completely filled like they are in Photoshop. See the attachments:
This is how it looks like in PhotoShop:
https://s30.postimg.org/hhzop7ksh/Screen_Shot_2014_12_15_at_13_32_23.png
And this is what it looks like in, for example, google chrome's PDF reader.
https://s30.postimg.org/v06l1hwxt/Screen_Shot_2014_12_15_at_13_43_54.png
As you see, there is a white area in the font. How do I fix this?
I bet the PDF you created from Photoshop did not embed all fonts used by the PDF.
The consequence is that any PDF reader having to deal with this document needs to use a substitute font.
How to fix this? The first step is: make sure your Photoshop-created PDF does embed all the fonts it uses. (Then see, if that is already doing what you expect, or if there are more fixes needed.)
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 years ago.
Improve this question
I have a PDF with the following text:
Localização
When I copy this text and paste, it gives me:
localizac¸ ˜ao
Any help is appreciate
Tks
For computer generated documents (not OCRd/scanned)
Some systems like LaTeX generates composed characters because the system's font doesn't contain (or support) such glyph in the current encoding. As consequence. They are generated on the fly using Composed Glyphs.
Making two glyphs look like one:
A + ´ -> Á
Because of this 'trick', the selectable PDF Text Information contains the two separated glyphs. But graphically they are both rendered at the same spot.
The quick solution:
Luckily, the generated character pairs do not happen naturally in a well written paragraph (maybe in any language). So is quite safe just search/replace them using a case-sensitive method. You can do it manually with your favorite text editor, or using a python script, etc. Automated or not, the principle of the solution is the same.
It is important to know how you are copying the text. If you are merely using a text editor and altering the underlying PDF code, you are going to have problems. PDF files are organized in a very complicated and non-human-readable way that require specialized programs to alter successfully. If you want to make this change, you will need to use a PDF editor to either edit the document, or generate a new document from scratch.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 2 years ago.
Improve this question
Is there free way to go though bunch of pdf image only files and folders (in different location) and OCR them?
I would be really interested it... please suggest..
Try VietOCR, which monitors a watch folder for new input images. The program requires GhostScript to recognize PDF format.
I recommend OCRvision OCR PDF software. It has OCR folder watch where you can configure any folder as a monitored folder and the software will auto-OCR the PDF files there and convert any new scanned documents to searchable PDF. You can download the software from the web site.
PS:- I work for OCrvision
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Is there a free way to read PDF files through VBA to extract basic text content? I need to automate a weekly data acquisition process at my company where data is contained in PDF files (which are updated weekly by the data provider). Also, is there a reference I can look into to understand the file structure (DOM?) of a PDF?
Adobe's PDF reference is online here: http://www.adobe.com/devnet/pdf/pdf_reference.html
I'm not sure about the best way to read PDFs from VBA directly, but if you can call an external Java or C# program, then I would recommend using iText for basic text extraction.
EDIT: I should maybe mention that Adobe's PDF reference is an 800 page beast. I found that it's good for looking up answers to particular questions (eg, storing widths of embedded truetype fonts), but it may not be a good place to start. For that, reading through the iText book helped me to get started on the format.
The IText book contains lots of worked examples for general PDF tasks and lots of background info to help you understand PDF files. It more than pays for itself very quickly!
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I have a CV in PDF format which is to be converted to LaTeX code. Is there a way to 'reverse engineer' the PDF so that I can get the latex code?
Short answer: No
Slightly longer answer:
You may get the plain text back but you can't restore the original latex source.
You may be able to import PDF into a word processor and export LaTeX from it (Either AbiWord of KOffice can do that, if I remember correctly), but the result will not be pretty. This won't get you the original LaTeX, but a very poor approximation. I think recreating the CV from scratch in LaTeX will be easier.
No. An explanation can be found here:
The job just can’t be done automatically: DVI, PostScript and PDF are
“final” formats, supposedly not susceptible to further editing —
information about where things came from has been discarded. So if
you’ve lost your (La)TeX source (or never had the source of a document
you need to work on) you’ve a serious job on your hands. In many
circumstances, the best strategy is to retype the whole document, but
this strategy is to be tempered by consideration of the size of the
document and the potential typists’ skills.
Just like you can automatically reverse engineer C code (though not very readable and with certain limitations) from a compiled exe you should be able to reverse engineer the LaTeX code from a compiled PDF. There just don't seem to be any tools around that even attempt this. This would sure be an interesting thing to implement.
There's some research going on in that area:
http://www.fi.muni.cz/~sojka/dml-2011-baker-sexton-sorge.pdf
The Latex file will have been printed to PDF, converting the contents into Postcript commands.