I was wondering if it is possible to programaticaly create a PDF file with an acceptable quality for the production press, ideally using only open-source libraries.
Right now the process is like this:
-create texts and images
-merge them into a postscript file
-use Acrobat Distiller to convert it to PDF (Acrobat distiller helps you check all the parameters of the PDF)
-send the PDF to the press
What I want is something like:
-take all texts and pictures in this folder
-encode them into the press-ready PDF, something similar to what Distiller produces
-send them to the press
How would you do that?
Many thanks...
Are the Ghostscript's gsdll32.dll and gswin{32,64}.c.exe with their source code and the GPL3 enough (or too much) of Open Source? They ship as part of all recent releases (newest one currently: v8.71).
Ghostscript can create very good quality PDF. See here for the most recent documentation about its PDF/A and PDF/X support.
Note, that this documentation until very recently was a bit misleading: it missed hinting at the requirement to edit+adapt the referred PDFA_def.ps or PDFX_def.ps templates. If you followed the old documentation without editing the templates to specifically point to the ICC color profile you wanted to embed, your output would be valid PDF, but would not pass all checks testing for compliancy with the official PDF/A+PDF/X standards.
You can generate pdfs using f.e. TeXML and XeLaTeX (first one to make scripting easier -- TeX has lots of quirks in syntax).
I also tried OpenJade and its DocBook support, but the quality was lower. TeX seems to do typesetting much better.
Both ways are using standalone programs... which you can use in shell scripts or call using system facilities.
You didn't mention which version of Distiller you're using. Recent versions do have a setting that lets you generate (different verions of) PDF/X. See also the *.joboptions files which ship with Distiller.
Related
I've already used ghostscript to check PDF files and now i need to identify a pdf with more than 1000 nodes. Is it possible to use ghostscript to count the number of nodes a PDF have?
My knowledge with ghostscript is basic and I have difficulty finding a solution in ruby (PDF reader 1.3) or using tools like imageMagick.
Edit:
I can not explain in a more technical way what kind of node I'm looking for. These nodes are equivalent to those found in the corel draw. Initially I thought it would not have equivalent in pdf however the pitstop plugin has the functionality to indentify nodes.
Example of identified nodes by PitStop Pro
Those aren't 'nodes', they are the start and end points of path segments. Ghostscript doesn't have a device to extract paths. It could do so easily enough (and recover the curve control points which don't appear to be displayed in your PitStop screen grab).
However, Ghostscript isn't an editing tool, so its not at all clear what you would intend to do with the information.
If you want this, then you are going to have to either parse the PDF yourself, or write a Ghostscript device to retrieve the information, or write a program for some other tool (eg MuPDF) to extract path information.
I wasn't able to find anything on the internet and I get the feeling that what I want is not such a trivial thing. To make a long story short: I'd like to get my hands on the underlying code that describes the PDF document of a selected area from a .pdf file. I've been looking for libraries or open source readers but couldn't find anything useful yet.
Does there exist something that might be able to accomplish my needs here or anything that might be reused (like an open source reader) to get there a little faster and not having to write everything from scratch?
You can convert a whole PDF document to PostScript using pdftops, one of the utilities from the poppler PDF rendering library.
This utility enables you to convert individual pages, which is at least a start.
If you just want to extract bitmapped images, try pdfimages from the same package. This extraction can also be restricted to individual pages.
The poppler library was originally written for UNIX-like systems, but there are a couple of windows builds available.
The open source tool from iText called iText RUPS does what you want, showing you all the PDF commands for a particular PDF and allow you to visualize the structure and relationships.
http://sourceforge.net/projects/itextrups/
Given a PDF file. Can I find out which software/libraries (e.g. PDFBox, Adobe Acrobat, iText...) where used to created/edit it?
The Adobe specification defines the Producer field (see 'Mac OS X 10.5.6 Quartz PDFContext' in screenshot nimeshjm's answer) as the name of the application that "converted from another format to PDF". In case of generating a PDF programmatically, the PDF isn't really converted so you will normally find the name of the generating SDK here.
The Creator field is related and is defined as the name of the application that created the document from which the PDF was converted. This is typically MS Word or so.
Note that this is all by convention. In practice, you cannot really rely on this and you may encounter for example empty Producer fields.
You can try opening the file in Adobe Acrobat Reader and look at the properties.
You can find this in: File -> Properties in Adobe Acrobat Reader after you open the pdf file.
You can probably get away without any PDF libraries for this type of operation. It won't be 100% reliable but I think you can probably assume 99% reliability.
So... write some code to open your PDF as a text stream and seaarch down for /Producer. You will find something like this:
69 0 obj
<<
/Creator (PDF+Forms 2.0)
/CreationDate (D:20010627111809)
/Title (Demo)
/Producer (Cardiff Software - TELEform 7.0)
/ModDate (D:20010627111810-05'00')
>>
Grab the bits between the parentheses and Bob's your uncle. Technically the text can be stored in other formats to but I think those will be pretty uncommon for this particular type of entry.
If you can't find anything here then look for the XMP data which is always guaranteed to be in clear text. It will look something like this,
39 0 obj
<</Subtype/XML/Length 15172/Type/Metadata>>stream
<?xpacket begin='' id='W5M0MpCehiHzreSzNTczkc9d'?>
<x:xmpmeta xmlns:x="adobe:ns:meta/" x:xmptk="Adobe XMP Core 4.0-c320 44.293068, Sun Jul 08 2007 18:10:11">
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<rdf:Description rdf:about=""
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:xap="http://ns.adobe.com/xap/1.0/"
xmlns:xapGImg="http://ns.adobe.com/xap/1.0/g/img/"
xmlns:xapMM="http://ns.adobe.com/xap/1.0/mm/"
xmlns:pdf="http://ns.adobe.com/pdf/1.3/"
dc:format="application/pdf"
xap:CreatorTool="Adobe Illustrator CS2"
xap:CreateDate="2006-05-04T15:53:27-07:00"
xap:ModifyDate="2006-05-04T15:53:27-07:00"
xap:MetadataDate="2006-05-04T15:53:27-07:00"
xapMM:DocumentID="uuid:61AC83CBC0DBDA11A32BC847EF128E34"
xapMM:InstanceID="uuid:cba15bf3-d7da-4a4e-a563-fc20d13e258a"
pdf:Producer="Adobe PDF library 7.77">
<dc:title>
<rdf:Alt>
<rdf:li xml:lang="x-default">3.01 PDF components</rdf:li>
</rdf:Alt>
</dc:title>
...
The combination of these two is going to be practically always right. If you want 100% reliablity then by all means use a PDF library but for many purposes this should be sufficient.
My replies may feature concepts based around ABCpdf. It's what I work on. It's what I know. :-)
It is usually difficult to determine which software actually designed a PDF because most of Microsoft Office product can convert an edited file to PDF. By this I mean, opening a regular typed document, you have the option to save it as PDF. If you are familiar with Powerpoint slides, it can be easy to tell based on the design once the file is in PDF.
Where as on the other hand, Adobe Acrobat has the ability to create the file like those application forms we often download (from an embassy site, immigration site, etc).
Other software such as Adobe Photoshop, Illustrator, etc... can save files as PDF. Hope this help.
We have created PDFs from converting individual PostScript pages into a single PDF (and embedding appropriate fonts) using GhostScript.
We've found that an individual page of the PDF cannot be linked to; for example, through the usage of
http://xxxx/yyy.pdf#page=24
There must be something within the PDF that makes this not possible. Are there any specific GhostScript options that should be passed when creating the PDF that would allow this type of page-destination link to work?
There are no specific pdfwrite (the Ghostscript device which actually produces PDF) options to do this. Without knowing why the (presumably) web browser or plugin won't open the file at the specified page its a little difficult to offer any more guidance.
What are you using to view the PDF files ?
Can you make a very simple file that fails ? Can you make that file public ?
If I can reproduce the problem, and the file is sufficiently simple, it may be possible to determine the problem. By the way, which version of Ghostscript are you using ?
Not sure this is a programming question, but we use LaTeX for all our API documentation and user documentation, so I hope it will go through.
Can someone please explain what are the relative merits of using pdflatex as opposed to the "classic" technique of
latex foo
dvips -Ppdf foo
ps2pdf foo.ps
From time to time I run into people who have difficulty because things don't work in pdflatex, and I know that using pdflatex gives up two things I have grown to value:
Can't use the very speedy xdvi viewer
Can't use the PStricks package
I should add that I typically get PDF with hyperlinks by using something on the order of
\usepackage[ps2pdf,colorlinks=true]{hyperref}
so it's not necessary to use pdflatex to get good PDF.
So
What are the advantages of pdflatex that I don't know about?
What are the disadvantages of the old tools that I've overlooked?
My favorite pdflatex feature is the microtype package, which is available only when using pdflatex to go directly to PDF, and really produces stunning results with no effort on my part. Apart from that, the only caveats I run into are image formats:
pdflatex supports PDF, PNG, and JPG images.
the postscript drivers support (at least) EPS.
Also, if you want to install fonts, the procedures are slightly different depending on what fonts that driver supports. (Hint: use XeTeX to instantly enable OpenType fonts.)
As it turns out, I recently read a post that shows the difference directly. Any document that uses tables or narrow columns will be improved automatically. I also find the inter-word spacing to be far more pleasing with pdflatex.
Is xdvi much faster than xpdf? I find the edit, TeX, view cycle to be very quick with pdflatex.
Have you tried MetaPost or MetaFun for graphics? I tend to put graphics creation in the hands of the capable, but MetaFun would likely be the package I'd use. Just reading the manuals is a pleasure.
Also pdftex is the engine under development (towards luatex) and maintenance. I'm not sure the DVI counterparts are as actively maintained.
PStricks is supplanted by Tikz.
I didn't use xdvi in years, so pardon the trollish rhetorical questions: Does xdvi display vector fonts? Does it support synctex (jumping to and from code)? Does it have the confort of use of PDF readers like Skim?
Taco Hoekwater is working on Escrito, a Postscript interpreter written in Lua, which would allow you to use pstricks in Luatex. He has an impressive project completion record: maybe I should have used "will" rather than "would" in the previous sentence.
I used pdflatex to generate the PDF for my ICFP 2009 paper. (I still needed to use standard latex to generate the PostScript file.) I did so for two reasons:
I couldn't seem to get ps2pdf to generate Letter, rather than A4 output, no matter what command line options I used.
For the printers, I needed to produce a version 1.3 PDF file, not 1.4. pdflatex made this easy to do. I set the PDF author and title information while I was at it.
Both of these problems may be fixable in some way, but as a first-time latex user, I didn't find any obvious solutions, nor did more experienced users whom I'd asked.