Track and output changes/diffs between LaTeX document revisions to PDF? - pdf

We want to keep track of changes in a LaTeX document in such a way that people who can't read LaTeX can also see the changes at once. The .tex files are stored in a git repository. So detailed information about the changes is available.
For this purpose I think it might be possible to use the git diff output between two revisions to generate the PDF and somehow mark the changes since the selected other revision of the document.
Do you know of an (easy) way to achieve this?
Do you know of other ways to visualize differences between PDF files?

[Expanding on my comment, since it apparently helped :-) ]
latexdiff is a Perl script that can diff two LaTeX documents and mark up changes without the distractions of the LaTeX markup itself. The README says:
latexdiff is a Perl script, which compares two latex files and marks
up significant differences between them (i.e. a diff for latex files).
Various options are available for visual markup using standard latex
packages such as "color.sty". Changes not directly affecting visible
text, for example in formatting commands, are still marked in
the latex source. Note that only files conforming to latex syntax will
be processed correctly, not generic TeX files. Some further
minor restrictions apply, see documentation.
A rudimentary revision facilility is provided by another Perl script,
latexrevise, which accepts or rejects all changes. Manual
editing of the difference file can be used to override this default
behaviour and accept or reject selected changes only.
The author is F Tilmann.
The project is developed on Github, but you can get the script in a tarball from CTAN if you prefer. The link in the comment is a useful overview of how to use it.

Related

GhostScript creating extra page when font errors occur

I have a process that needs to write multiple postscript and pdf files to a single postscript file generated by, and that will continue to be modified by, word interop VB code. Each call to ghostscript results in an extra blank page. I am using GhostScript 9.27.
Since there are several technologies and factors here, I've narrowed it down: the problem can be demonstrated by converting a postscript file to postscript and then to pdf via command line. The problem does not occur going directly from postscript to pdf. Here's an example and an example of the errors.
C:\>"C:\Program Files (x86)\gs\gs9.27\bin\gswin32c.exe" -dNOPAUSE -dBATCH -sDEVICE=ps2write -sOutputFile=C:\testfont.ps C:\smallexample.ps
C:\>"C:\Program Files (x86)\gs\gs9.27\bin\gswin32c.exe" -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=C:\testfont.pdf C:\testfont.ps
Can't find (or can't open) font file %rom%Resource/Font/TimesNewRomanPSMT.
Can't find (or can't open) font file TimesNewRomanPSMT.
Can't find (or can't open) font file %rom%Resource/Font/TimesNewRomanPSMT.
Can't find (or can't open) font file TimesNewRomanPSMT.
Querying operating system for font files...
Didn't find this font on the system!
Substituting font Times-Roman for TimesNewRomanPSMT.
I'm starting with the assumption that the font errors are the cause of the extra page (if only to rule that out, I know it is not certain). Since my ps->pdf test does not exhibit this problem and my ps->ps->pdf does, I'm thinking ghostscript is not writing font data that was in the original postscript file to the one it is creating. I'm looking for a way to preserve/recreate that in the resulting postscript file. Or if that is not possible, I'll need a way to tell ghostscript how to use those fonts. I did not have success attempting to include them as described in the GS documentation here: https://www.ghostscript.com/doc/current/Use.htm#CIDFontSubstitution.
Any help is appreciated.
I've made this an answer, even though I'm aware it doesn't answer the question, becasue it won'f fit as a comment.
I think your assumption that the missing fonts are causing your problem is flawed. Many PDF files do not embed all the fonts they require, I've seen many such examples and they do not emit extra pages.
You haven't been entirely clear in your description of what you are doing. You describe two processes, one going from PostScript to PDF and one going from PostScript on to PostScript (WHY ?) and then to PDF.
You haven't described why you are processing PostScript into a PostScript file.
In particular you haven't supplied an example file to look at. Without that there's no way to tell whether your experience is in fact correct.
For example; its entirely possible that you have set /Duplex true and have an odd number of pages in your file. This will cause an extra blank page (quite properly) to be emitted, because duplexing requires an even number of pages.
The documentation you linked to is for CIDFont substitution, it has nothing to do with Font substitution, CIDFonts and Fonts are different things in PDF and (more particularly) PostScript. But I honestly doubt that is your problem.
I'd suggest that you put (at the least) 'smallexample.ps' somewhere public and post the URL here, that way we can at least follow the same steps you are doing. That way we can probably tell you what's going on. An explanation of why you're doing this would be useful too, I would normally strongly suggest that you don't do extra steps like this; each step carries the risk of degarding the output in some way.
Thank you for the response. I am posting as an answer as well due to the comment length restrictions:
I think you are correct that my assumption about fonts is wrong. I have found the extra page in the second ps file and do not encounter the font errors until the second conversion.
I have a process that uses VB MSWord Interop libraries to print multiple documents to a single ps file using a virtual printer set up with ghostscript and redmon. I am adding functionality to mix in pdf files too. It works, but results in an extra page. To narrow down where the problem actually was, I tried much simpler test cases via command line to identify the problem. I only get the extra page when ghostscript is converting ps to ps (whether or not there is a pdf as well). Converting ps to pdf I do not get the extra page. Interestingly, I can work around the problem by converting the ps to pdf and then both pdfs back to ps. That is a slower and should not be necessary however, so I would like to identify and resolve the extra page issue. I cannot share that particular file. I'll see if I can create an example I can share that also exhibits the problem. In the meantime, I can confirm that the source ps file is six pages and the duplexing settings are as follows. There is duplex definition in the resulting ps file with the extra page. Might there be some other common culprits I could check for in the source ps? Thank you.
featurebegin{
%%BeginFeature: *DuplexUnit NotInstalled
%%EndFeature
}featurecleanup
featurebegin{
%%BeginFeature: *Duplex None
<</Duplex false /Tumble false>> setpagedevice
%%EndFeature
}featurecleanup

Tables or images too wide in Pandoc output as DOCX or PDF/LaTeX

I am writing a quick and dirty report using pandoc and markdown.
I need to generate a PDF or a DOCX with minimum hassle, I don't care much about which (best would be both, of course). Also, I am somewhat constrained regarding the figures and tables -- they have been generated a priori with another program and I would rather be able to insert them as they are then to convert them to suit pandoc's needs.
However, the main constraint is that I don't want to edit the resulting document manually, be that LaTeX or DOCX. I want to do all editing in markdown.
Here is the problem:
In DOCX, the tables are displayed fine: they have the width of the document. However, the figures are much too wide. I can either convert the images to a lower resolution (which doesn't look nice), or manually resize the images in Word (which is out of question).
In PDF, the generated figures are fine (more or less), however another two problems appear:
The tables are too wide, because there are no line breaks, and
LaTeX being LaTeX, the order of figures and tables are "reorganized", that is, they are not consecutive.
Thus, none of the documents generated are usable for my purposes.
All I wanted to do is to slap together some results and generate a file that I can send to another scientist.
Question: what is the best solution to generate a quick and dirty report in pandoc with minimum effort and at least all results visible?
Update: Upgrading pandoc to 1.4 or later solves the issue -- the figures have now correct sizes in docx documents.
Control over image size
Currently you cannot control that feature directly from Markdown. For LaTeX/PDF output, this is automatically handled by LaTeX/pdflatex itself.
In recent months there have been some discussions going on in the Pandoc developer + user community about how to best implement it and create an easy-to-use syntax, for example
![Image Caption](./path/to/image.jpg "Image Comment"){width="60%", height="150px"}
(Warning: Example only, made up on the spot + extracted from thin air by myself -- can't remember the latest state of the discussion...) This is designed to then transfer to all the supported output formats which can contain images, not just to LaTeX/PDF.
So something along these lines is planned to be a major new feature for the next major release of Pandoc, and will start to be working better in ODT/DOCX output as well.
Control over table/cell widths and line breaks within cells
How exactly do you specify your tables in Markdown syntax?
Are you aware that Pandoc supports several variations like gid_tables, pipe_tables, simple_tables and multiline_tables?
You should look into using pandoc --from=markdown+multiline_tables ... as your command and write the critical tables as multiline_tables in your Markdown.
Read all about the details via man pandoc_markdown...
Multiline tables give you a limited control over the width of individual columns in the output, just by widening or narrowing the column widths in the markdown source itself.
Order of figures and tables when outputting LaTeX/PDF
Pandoc supports the insertion of raw_tex lines and environments into the Markdown source file. When it encounters such lines, it transmits them un-changed into its LaTeX output. (But it will be ignored for all other outputs.)
So you can insert lines like
\newpage{}
into the Markdown to enforce a page break. This already gives you some limited control over keeping the order of mis-behaving figures or tables. (After all, you said you look for a "quick and dirty" method, not a sophisticated typeset document...)
Of course, if you know LaTeX more and better, you can also use stuff like
/FloatBarrier inside your Markdown.
Going down that road (mixing LaTeX code into Markdown) gives you a few disadvantages:
The Markdown will not look as pretty any more.
The Markdown will not work fully with other output formats (should you need them).
But the advantage still are:
You will be writing and modifying the document text much faster in Markdown than authoring it in LaTeX.
You have some additional control over the final look of your PDF:
order of tables + figures
look + width of tables + figures (because, you can of course insert a complete LaTeX 'figure' or 'table' environment).

Creating ODT and PDF files as end result

I've been working on an app to create various document formats for a while now, and I've had limited success.
Ideally, I'd like to dynamically create a fairly simple ODT/PDF/DOC file. I've been focusing my efforts on ODT, because it is editable, and open enough that there are several tools which will convert it to any of the other formats I need.
The problem is that the ODT XML files are NOT simple, and there aren't any good-quality API's I could find (especially in python). So far, I've had the most success creating a template ODT file, and then manipulating the DOM in python as needed. This is ok generally, but is quickly becoming inadequate and requires too much tweaking every single time I need to alter one of the templates.
The requirements are:
1) Produce a simple document that will have lists, paragraphs, and the ability to draw simple graphics on the page (boxes, circles, etc...)
2) The ability to specify page size, and the different formats should generally print the exact same output when sent to a printer
My questions:
1) Are there any other ways I can produce ODT/PDF/DOC files?
2) Would LaTeX be acceptable? I've never really used it, does anyone have experience converting LaTeX files into other formats?
3) Would it be possible to use HTML? There are a lot of converters online. Technically you can specify dimensions in mm/cm, etc..., but I am worried that the printed output will differ between browsers/converters....
Any other ideas?
have you tried pandoc? i've been using it with good success for the conversion of different formats into each other. why try to invent the wheel twice?
I suppose to be successful, you'd have to define how you want to input everything. Why don't you just use openoffice? it will save to ODT (duh...), PDF, and HTML (though it's not clean HTML, it's actually quite ugly).
In my recent experience, I've had success going from latex -> xhtml via LaTeXML (i had to compile from source). LaTeX is seeming more and more like a terminal format. It's great for PDF, but once you need some flexibility, it kind of fails. I should also note that there is no latex -> dvi in my workflow, so I can't comment on things like tex4ht that reads out of a dvi file (I have too many graphics that don't work with DVI to switch them now).
Shortly I'll be moving everything into docbook 4.5-- i like the docbook-utils package which supports latex, html, and i even saw a converter to ODT. But docbook is super-heavy on the markup, which is annoying, but it will provide me with the flexibility i need going forward.
Since you're using python, have you just considered using ReStructured Text?
I've also really enjoyed publishing from emacs' orgmode, which is a super light weight markup that goes into a bunch of different formats.

Generating PDF documents from LISP

I want to generate a technical report from lisp (AllegroCL in my case) and I studied various packages/project to help me do this.
Requirements:
Need to generate a PDF
May create an intermediate format like RTF, Restructured TEXT, HTML, Word DOC or Latex
Need to be flexible to be able to add content throughout my application
Need to handle Multi-Page, Headers, Footers, Tables, inclusion of Images.
Possibilities:
cl-pdf and cl-typesetting: I checked this one out and it works for now, but is there a better alternative?
Some Latex generator, but ???
Question:
Do you know alternatives to easily generate (PDF) reports from lisp. What is the best workflow to go for?
we are using cl-pdf and cl-typesetting for the last 3 years and it has numerous issues... (like its confusion around encodings, or silently not rendering things that don't fit, or...) so, i don't recommend new development based on them.
currently we are in the process of moving all our export mechanisms to open document format. openoffice is all happy with it, and there's a plugin for ms office, too.
there's .fodt, the so called flat open document text format, which is a mere xml file describing a document. generating it is as easy as generating xml files.
you can also make parts of your document read-only with a password (insert a section and mark it read-only and protected by a password. when generating the xml, you can generate random hashes as password...).

What are the relative merits of pdflatex?

Not sure this is a programming question, but we use LaTeX for all our API documentation and user documentation, so I hope it will go through.
Can someone please explain what are the relative merits of using pdflatex as opposed to the "classic" technique of
latex foo
dvips -Ppdf foo
ps2pdf foo.ps
From time to time I run into people who have difficulty because things don't work in pdflatex, and I know that using pdflatex gives up two things I have grown to value:
Can't use the very speedy xdvi viewer
Can't use the PStricks package
I should add that I typically get PDF with hyperlinks by using something on the order of
\usepackage[ps2pdf,colorlinks=true]{hyperref}
so it's not necessary to use pdflatex to get good PDF.
So
What are the advantages of pdflatex that I don't know about?
What are the disadvantages of the old tools that I've overlooked?
My favorite pdflatex feature is the microtype package, which is available only when using pdflatex to go directly to PDF, and really produces stunning results with no effort on my part. Apart from that, the only caveats I run into are image formats:
pdflatex supports PDF, PNG, and JPG images.
the postscript drivers support (at least) EPS.
Also, if you want to install fonts, the procedures are slightly different depending on what fonts that driver supports. (Hint: use XeTeX to instantly enable OpenType fonts.)
As it turns out, I recently read a post that shows the difference directly. Any document that uses tables or narrow columns will be improved automatically. I also find the inter-word spacing to be far more pleasing with pdflatex.
Is xdvi much faster than xpdf? I find the edit, TeX, view cycle to be very quick with pdflatex.
Have you tried MetaPost or MetaFun for graphics? I tend to put graphics creation in the hands of the capable, but MetaFun would likely be the package I'd use. Just reading the manuals is a pleasure.
Also pdftex is the engine under development (towards luatex) and maintenance. I'm not sure the DVI counterparts are as actively maintained.
PStricks is supplanted by Tikz.
I didn't use xdvi in years, so pardon the trollish rhetorical questions: Does xdvi display vector fonts? Does it support synctex (jumping to and from code)? Does it have the confort of use of PDF readers like Skim?
Taco Hoekwater is working on Escrito, a Postscript interpreter written in Lua, which would allow you to use pstricks in Luatex. He has an impressive project completion record: maybe I should have used "will" rather than "would" in the previous sentence.
I used pdflatex to generate the PDF for my ICFP 2009 paper. (I still needed to use standard latex to generate the PostScript file.) I did so for two reasons:
I couldn't seem to get ps2pdf to generate Letter, rather than A4 output, no matter what command line options I used.
For the printers, I needed to produce a version 1.3 PDF file, not 1.4. pdflatex made this easy to do. I set the PDF author and title information while I was at it.
Both of these problems may be fixable in some way, but as a first-time latex user, I didn't find any obvious solutions, nor did more experienced users whom I'd asked.