What are the relative merits of pdflatex? - pdf

Not sure this is a programming question, but we use LaTeX for all our API documentation and user documentation, so I hope it will go through.
Can someone please explain what are the relative merits of using pdflatex as opposed to the "classic" technique of
latex foo
dvips -Ppdf foo
ps2pdf foo.ps
From time to time I run into people who have difficulty because things don't work in pdflatex, and I know that using pdflatex gives up two things I have grown to value:
Can't use the very speedy xdvi viewer
Can't use the PStricks package
I should add that I typically get PDF with hyperlinks by using something on the order of
\usepackage[ps2pdf,colorlinks=true]{hyperref}
so it's not necessary to use pdflatex to get good PDF.
So
What are the advantages of pdflatex that I don't know about?
What are the disadvantages of the old tools that I've overlooked?

My favorite pdflatex feature is the microtype package, which is available only when using pdflatex to go directly to PDF, and really produces stunning results with no effort on my part. Apart from that, the only caveats I run into are image formats:
pdflatex supports PDF, PNG, and JPG images.
the postscript drivers support (at least) EPS.
Also, if you want to install fonts, the procedures are slightly different depending on what fonts that driver supports. (Hint: use XeTeX to instantly enable OpenType fonts.)

As it turns out, I recently read a post that shows the difference directly. Any document that uses tables or narrow columns will be improved automatically. I also find the inter-word spacing to be far more pleasing with pdflatex.
Is xdvi much faster than xpdf? I find the edit, TeX, view cycle to be very quick with pdflatex.
Have you tried MetaPost or MetaFun for graphics? I tend to put graphics creation in the hands of the capable, but MetaFun would likely be the package I'd use. Just reading the manuals is a pleasure.

Also pdftex is the engine under development (towards luatex) and maintenance. I'm not sure the DVI counterparts are as actively maintained.
PStricks is supplanted by Tikz.
I didn't use xdvi in years, so pardon the trollish rhetorical questions: Does xdvi display vector fonts? Does it support synctex (jumping to and from code)? Does it have the confort of use of PDF readers like Skim?

Taco Hoekwater is working on Escrito, a Postscript interpreter written in Lua, which would allow you to use pstricks in Luatex. He has an impressive project completion record: maybe I should have used "will" rather than "would" in the previous sentence.

I used pdflatex to generate the PDF for my ICFP 2009 paper. (I still needed to use standard latex to generate the PostScript file.) I did so for two reasons:
I couldn't seem to get ps2pdf to generate Letter, rather than A4 output, no matter what command line options I used.
For the printers, I needed to produce a version 1.3 PDF file, not 1.4. pdflatex made this easy to do. I set the PDF author and title information while I was at it.
Both of these problems may be fixable in some way, but as a first-time latex user, I didn't find any obvious solutions, nor did more experienced users whom I'd asked.

Related

Why does the combination pdf2ps / ps2pdf shrink the PDF?

When researching how to compress a bunch of PDFs with pictures inside (ideally in a lossless fashion, but I'll settle for lossy) I found that a lot of people recommend doing this:
$ pdf2ps file.pdf
$ ps2pdf file.ps
This works! The resulting file is smaller and looks at least good enough.
How / why does this work?
Which settings can I tweak in this process?
If there is some lossy conversion, which one is that?
Where is the catch?
People who recommend this procedure rarely do so from a background of expertise or knowledge -- it's rather based on gut feelings.
The detour of generating a new PDF via PostScript and back (also called "refrying a PDF") is never going to give you the optimal results. Sometimes it is useful, f.e. in cases were the original PDF isn't printed at all, or cannot be processed by another application. But these cases are very rare.
In any case, this "roundtrip" conversion will never lead to the same PDF file as initially.
Also the pdf2ps and ps2pdf tools aren't an independent tools at all: they are just simple wrapper scripts around a Ghostscript (gs or gswin32c.exe) command line. You can check that yourself by doing:
cat $(which ps2pdf)
cat $(which pdf2ps)
This will also reveal the (default) parameters these simple wrappers use for the respective conversions.
If you are unlucky, you will have an ancient Ghostscript installed. The PostScript which is then generated by pdf2ps will be Level 1 PS, and this will be "lossy" for many fonts which could be used by more modern PDF files, resulting in rasterization of previous vector fonts. Not exactly the output you'd like to look at...
Since both tools are using Ghostscript anyway (but behind your back), you are better off to run Ghostscript yourself. This gives you more control over the parameters it uses. Especially advantageous is the fact that this way you can get a direct PDF->PDF conversion, without any detour via an intermediary PostScript file format.
Here are a few answers which would give you some hints about what parameters you could use in order to drive the file size down in a semi-controlled way in your output PDF:
Optimize PDF files (with Ghostscript or other) (StackOverflow)
Remove / Delete all images from a PDF using Ghostscript or ImageMagick (StackOverflow)

Creating ODT and PDF files as end result

I've been working on an app to create various document formats for a while now, and I've had limited success.
Ideally, I'd like to dynamically create a fairly simple ODT/PDF/DOC file. I've been focusing my efforts on ODT, because it is editable, and open enough that there are several tools which will convert it to any of the other formats I need.
The problem is that the ODT XML files are NOT simple, and there aren't any good-quality API's I could find (especially in python). So far, I've had the most success creating a template ODT file, and then manipulating the DOM in python as needed. This is ok generally, but is quickly becoming inadequate and requires too much tweaking every single time I need to alter one of the templates.
The requirements are:
1) Produce a simple document that will have lists, paragraphs, and the ability to draw simple graphics on the page (boxes, circles, etc...)
2) The ability to specify page size, and the different formats should generally print the exact same output when sent to a printer
My questions:
1) Are there any other ways I can produce ODT/PDF/DOC files?
2) Would LaTeX be acceptable? I've never really used it, does anyone have experience converting LaTeX files into other formats?
3) Would it be possible to use HTML? There are a lot of converters online. Technically you can specify dimensions in mm/cm, etc..., but I am worried that the printed output will differ between browsers/converters....
Any other ideas?
have you tried pandoc? i've been using it with good success for the conversion of different formats into each other. why try to invent the wheel twice?
I suppose to be successful, you'd have to define how you want to input everything. Why don't you just use openoffice? it will save to ODT (duh...), PDF, and HTML (though it's not clean HTML, it's actually quite ugly).
In my recent experience, I've had success going from latex -> xhtml via LaTeXML (i had to compile from source). LaTeX is seeming more and more like a terminal format. It's great for PDF, but once you need some flexibility, it kind of fails. I should also note that there is no latex -> dvi in my workflow, so I can't comment on things like tex4ht that reads out of a dvi file (I have too many graphics that don't work with DVI to switch them now).
Shortly I'll be moving everything into docbook 4.5-- i like the docbook-utils package which supports latex, html, and i even saw a converter to ODT. But docbook is super-heavy on the markup, which is annoying, but it will provide me with the flexibility i need going forward.
Since you're using python, have you just considered using ReStructured Text?
I've also really enjoyed publishing from emacs' orgmode, which is a super light weight markup that goes into a bunch of different formats.

Create print-ready PDF/X (with bleedbox, trimbox, mediabox, etc) programatically?

I was wondering if it is possible to programaticaly create a PDF file with an acceptable quality for the production press, ideally using only open-source libraries.
Right now the process is like this:
-create texts and images
-merge them into a postscript file
-use Acrobat Distiller to convert it to PDF (Acrobat distiller helps you check all the parameters of the PDF)
-send the PDF to the press
What I want is something like:
-take all texts and pictures in this folder
-encode them into the press-ready PDF, something similar to what Distiller produces
-send them to the press
How would you do that?
Many thanks...
Are the Ghostscript's gsdll32.dll and gswin{32,64}.c.exe with their source code and the GPL3 enough (or too much) of Open Source? They ship as part of all recent releases (newest one currently: v8.71).
Ghostscript can create very good quality PDF. See here for the most recent documentation about its PDF/A and PDF/X support.
Note, that this documentation until very recently was a bit misleading: it missed hinting at the requirement to edit+adapt the referred PDFA_def.ps or PDFX_def.ps templates. If you followed the old documentation without editing the templates to specifically point to the ICC color profile you wanted to embed, your output would be valid PDF, but would not pass all checks testing for compliancy with the official PDF/A+PDF/X standards.
You can generate pdfs using f.e. TeXML and XeLaTeX (first one to make scripting easier -- TeX has lots of quirks in syntax).
I also tried OpenJade and its DocBook support, but the quality was lower. TeX seems to do typesetting much better.
Both ways are using standalone programs... which you can use in shell scripts or call using system facilities.
You didn't mention which version of Distiller you're using. Recent versions do have a setting that lets you generate (different verions of) PDF/X. See also the *.joboptions files which ship with Distiller.

How to open PDF and read it?

how can I open a PDF file and read some of it's contents with Python (this language is preferred, however Ruby, Perl or PHP are fine too) (in case it is recognized (not just an image)) or report that it's impossible without OCR? TIA
Update: thanks for the solutions, I'm sure some of them will suit me fine.
#RichH, I have a pdf file, and don't know whether it is image- or text-based. I'm looking for a tool to help me find that out and in case it's text-based extract some of it's contents.
For Perl, check out these modules:
PDF::API2
CAM::PDF
Parsing PDF and making something useful out of it is hard as the format is focused on keeping the layout so text can be stored in a way that each letter is positioned individually, depending on the font the text might also be stored as graphic.
libraries to read PDFs I know include the Zend Framework which has a PDF component which includes a PDF parser which can be used from PHP and gives more or less usaable results and the commercial PDFlib which offers quite usable results and offers binding to different languages.

Converting PCL to PDF

I am looking to create (as a proof-of-concept) an OCaml (preferably) program that converts PCL code to PDF format. I am not sure where to start. Is there a standardized algorithm for doing so? Is there any other advice available for accomplishing this task?
Thanks!
Conversion of PCL to PDF can be incredibly complex (assuming you need it to be generic and not just for simple PCL). We've investaged this many times and in the end always revert to using other tools. We keep investigating as we are a development shop who uses and understands all elements of PCL to great detail. If you are not really familure with PCL it will be daunting task. One of the major issues is that overtime, printers have become, for the most part, tollerent of malformed PCL and as such, creating something that follows the rules to the letter of the law is not always sufficient. If; however, you have control over the PCL, you may be able to work it out with some amount of success.
I don't mean to turn you off of this and I realize that you've come here looking for a programming answer but I have to say, this is a far from simple task and there are no 'standarized algorithms' for this (that I'm aware of).
If this is designed to be a tool to work alongside of somehting else you are building I'd highly recommend looking at these guys:
PageTech
This is by far the most complete set of tools (Windows) for handling this. There are a few others but, based on our extensive use of PCL and conversion tools over the years, this is the only one that work all the time.
EDIT: Most recently we've been working with LincPDF (http://www.lincolnco.com/). This is also an excellent product with has one big benefit, deployment is simple. Some of the other tools have complex software installations. This solution is very easy for us to deploy as a feature in an application. It's also faster then any tools we've tested to date (at least with the PCL that we generate from our apps which is quite complex as they include specialized fonts and macros).
Ghostscript developers have recently integrated their sister products GhostXPS, GhostPCL and GhostSVG into their Ghostscript source code tree. (It's now called GhostPDL.) So all of these additional functionalities (load, render and convert XPS, PCL and SVG) are now available from there.
This means you could build their language switching binary from their sources. This, in theory, can consume PCL, PDF and PostScript and convert this to a host of other formats. While it worked for me whenever I needed it, Ghostscript developers recommend to stop using the language switching binary (since it's 'almost non-supported' -- see KenS' comment to this answer) and instead switch to using the explicit binaries pcl6.exe (PCL input), gsvg.exe (SVG input, also 'almost non-supported') and gxps.exe (support status unclear to me).
So to 'convert PCL code to PDF format' as the request areads, you could use the pcl6 command line utility, a sister product to Ghostscript's gs/gswin32c.exe.
Sample commandline:
pcl6.exe \
-o output.pdf \
-sDEVICE=pdfwrite \
[...more parameters as required (optional)...] \
-f input.pcl
Updated as per KenS' hints in the comment....
There is a series of reference books from HP; you could re-implement a PCL parser and output corresponding PDF.
You might start with the "PCL 5 Printer Language Technical Reference Manual" (http://h20000.www2.hp.com/bc/docs/support/SupportManual/bpl13210/bpl13210.pdf) . Search HP for more (http://search.hp.com/query.html?qt=PCL+reference).
Or you could steal code or ideas from GhostPCL (http://www.ghostscript.com/GhostPCL.html)