get latex code out of a PDF file [closed] - pdf

Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I have a CV in PDF format which is to be converted to LaTeX code. Is there a way to 'reverse engineer' the PDF so that I can get the latex code?

Short answer: No
Slightly longer answer:
You may get the plain text back but you can't restore the original latex source.
You may be able to import PDF into a word processor and export LaTeX from it (Either AbiWord of KOffice can do that, if I remember correctly), but the result will not be pretty. This won't get you the original LaTeX, but a very poor approximation. I think recreating the CV from scratch in LaTeX will be easier.

No. An explanation can be found here:
The job just can’t be done automatically: DVI, PostScript and PDF are
“final” formats, supposedly not susceptible to further editing —
information about where things came from has been discarded. So if
you’ve lost your (La)TeX source (or never had the source of a document
you need to work on) you’ve a serious job on your hands. In many
circumstances, the best strategy is to retype the whole document, but
this strategy is to be tempered by consideration of the size of the
document and the potential typists’ skills.

Just like you can automatically reverse engineer C code (though not very readable and with certain limitations) from a compiled exe you should be able to reverse engineer the LaTeX code from a compiled PDF. There just don't seem to be any tools around that even attempt this. This would sure be an interesting thing to implement.
There's some research going on in that area:
http://www.fi.muni.cz/~sojka/dml-2011-baker-sexton-sorge.pdf

The Latex file will have been printed to PDF, converting the contents into Postcript commands.

Related

What is the correct term for a file something.zip? ".zip" or "zip"? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 5 years ago.
Improve this question
I am looking for the correct term to use when writing about a .zip file (presumably the answer will be the same for other file types as well). I am working on the documentation for a project, and both ".zip" and "zip" is used by other authors before. Sources on the internet seem to be inconsistent and I can't find any straight forward answer on the matter. Any real sources on which one is the correct term is very welcome.
The most trusted source I find is the Python documentation, which seems to consistently use "zip". Link.
Meanwhile, the Wikipedia page seems to mix "zip" and ".zip". Link.
zip is accepted as the name of the lossless compression format developed in the late 1980s by Phil Katz.
.zip denotes a file extension that is probably associated with that format. Not all zip formats have the .zip extension; notable examples are the .jar files of Java, and the newer Excel formats .xlsx and .xlsm.
If I were you, I'd adopt the former (i.e. zip), and amend Wikipedia if you get the chance.
(Cf. tarball and .tar).
I am not sure if you'll find a written rule about this (I'm sure you've done some Google'ing already and came up empty), but I can give you the advice I give my coders and analysts that I work with: Be Consistent. If you choose one way, stick to it otherwise you'll have people assuming you're switching for a reason.
Wiki is collaboratively edited, thus you'll almost always have inconsistencies; Python Docs are more stringently edited and proofread.
Personally, I always put the "." before referencing any file type I believe my audience may not be intimately familiar with to alert them that it is a file extension. In analytics, I can get away with CSV, TXT, and XLSX, but I add it for .ZIP, .GZ, etc.

How to make InDesign's epub file vs. PDF file compatible? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 8 years ago.
Improve this question
We used Adobe inDesign to design story books. We need both the PDF file as well as epub file. Since we all view in PDF during the process, the final clear product in PDF, when we export as epub file, it's huge. It all messed up the original design. What can we do?
Why did it happen?
I've worked on ONE project going from InDesign to ePub about two years ago - and you are right it is a mess. It didn't understand which local overrides to keep and practically every paragraph had style="localoverride1 localoverride2 substyle3 etc" in it. It was a mess to sort and clean up.
After that miserable experience we've found that it is better to view PDF and ePub as two separate products. Our workflow takes source XML and goes EITHER into InDesign OR through an XSLT to make an ePub. We no longer use InDesign to attempt to make ePubs - with an XSLT there is a LOT more control over the look and feel of the final product.
However if you are dead set on using InDesign - I've heard that it does fixed layout "epub" fairly well (basically it ends up being a bunch of images - it's not reflowable).

How to convert PDF files to spreadsheets [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 years ago.
Improve this question
I have been trying the whole day to convert several. pdf files which contain traffic flow for São Paulo to spreadsheets like MS Office Excel, or LibreOffice Calc in Ubuntu. When I open the .pdf file with LibreOffice Calc it opens LibreOffice Draw, and I can't get the spreadsheet.
The most promising method that I found was here with pdftotext. It works fine and I can get the tables in LibreOffice Calc but adjusting manually the columns.
My problem is that I have so many .pdf files that it would take me a lot of time.
Does anyone know a better method?
Another option is to use Okular (http://okular.kde.org).
It has table selection tool (Ctrl+5).
You may select a table, add lines for additional rows and columns and copy the resulting table into a clipboard.
It works fine for me.
Tabula can work quite well. PDF is not an easy format to extract structured information from, so it's not always possible.
Maybe the -layout would be useful for you. With this option set, pdftotext will try to keep the column layout in the resulting text file.
Now, you can import the text file into LibreOffice Calc with the appropriate import settings. When opening a txt file in Calc, you will get asked how to parse the file content (see screenshot below). Under Separator Options, select both the Options [separated by] Space and Merge Delimiters. This way, Calc will be able to restore the column structure (assuming the cell data doesn't contain spaces).
Tool called Able2Extract is the option that can do for you exactly wat you want with minimum errors

Copy and Paste PDF text gives wrong text [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 years ago.
Improve this question
I have a PDF with the following text:
Localização
When I copy this text and paste, it gives me:
localizac¸ ˜ao
Any help is appreciate
Tks
For computer generated documents (not OCRd/scanned)
Some systems like LaTeX generates composed characters because the system's font doesn't contain (or support) such glyph in the current encoding. As consequence. They are generated on the fly using Composed Glyphs.
Making two glyphs look like one:
A + ´ -> Á
Because of this 'trick', the selectable PDF Text Information contains the two separated glyphs. But graphically they are both rendered at the same spot.
The quick solution:
Luckily, the generated character pairs do not happen naturally in a well written paragraph (maybe in any language). So is quite safe just search/replace them using a case-sensitive method. You can do it manually with your favorite text editor, or using a python script, etc. Automated or not, the principle of the solution is the same.
It is important to know how you are copying the text. If you are merely using a text editor and altering the underlying PDF code, you are going to have problems. PDF files are organized in a very complicated and non-human-readable way that require specialized programs to alter successfully. If you want to make this change, you will need to use a PDF editor to either edit the document, or generate a new document from scratch.

Add comments in pdf [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions must demonstrate a minimal understanding of the problem being solved. Tell us what you've tried to do, why it didn't work, and how it should work. See also: Stack Overflow question checklist
Closed 9 years ago.
Improve this question
I want to add some text in a pdf document from latex. The text is not supposed to be seen in the actual PDF, I want it more to be like a comment in a code, so I can load the "code" in a program and read the comments. Is this possible?
Kind regards
I don't know Latex enough to comment on that part of your question, but there are a number of different ways information can be stored inside PDF files that would satisfy your question.
Images in PDF files are typically objects (Image XObjects to be exact) - these have a dictionary where additional information could be stored next to the image data.
PDF supports the concept of object metadata where XMP metadata can be embedded in a PDF file for a specific object. This would be a second way to embed additional non-visible information in the PDF file (and a better one).
And perhaps best of all if you can generate this from Latex is the fact that PDF allows object properties, which uses marked content operators in the page stream to delineate a number of objects and then allows associating information to that marked content.
All of those should be easy to find in the PDF specification on the Adobe website; what would remain would be to figure out what ways you have in Latex to generate any of this and what you'd have to do to read them in your program :-)
There are two different ways:
You can either comment out single lines by adding a % in front of them
% This text will be a comment
Or you can comment out larger sections by doing this:
\usepackage{comment}
\begin{comment}
This text will be commented out.
\end{comment}
Hope this helps!