Is there such a thing as a reference rendering for a given PDF? - pdf

I'm working with a lot of third-party PDFs and since we are thinking about changing our rendering engine, I would like to compare different available engines. However, there are subtle differences in the rendering, so that I had to wonder, what do I have to compare those renderings against, is there any kind of reference renderer? Is it Adobe's Acrobat?

Yes, essentially every user expects every PDF renderer to match Adobe Acrobat.
Note though that the PDF specification does not describe rendering down to the pixel level, so some slight variations is expected and normal.
In particular Font hinting and aliasing are acceptable differences. And of course you can get into CMYK and how to best represent that on a RGB screen, so some variance can happen depending on the color profile selected.

Related

Blurry results on EPS/PDF import [duplicate]

I want to create a visualization of a matrix for some academic work. I decided to go about this by having the pixels in the image correspond to the values in the matrix. I created the nice small png that follows:
When properly scaled up, you get a very reasonable image:
This is a screenshot from within inkscape. However, when export this as a pdf, both evince and chrome do a terrible job at upscaling what should be very trivial, and instead I get something that looks like:
The pdf itself seems to scale appropriately well for printing, but unfortunately I do a lot of my editing without printing, and this looks unacceptable. I did find this incredibly old thread about people seeming to have a similar issue with chrome's pdf viewer, and the "solution" was to just upscale the raster graphics. This is a solution, but is terribly inefficient.
Is anyone aware of a way to change the pdf so that it gets upscaled appropriately? Maybe a config change in evince or chrome that will render these properly? Even a nice way to go from a raster image to a vector image might be suitable?
The comments aggregated into an answer...
An image dictionary in a PDF has an (optional) boolean entry Interpolate. It is specified as a flag indicating whether image interpolation shall be performed by a conforming reader.
The program used by the OP to create the PDF, Inkscape, seems to have explicitly set this flag to true. Editing the PDF to unset this flag creates a file which looks as desired by the OP.
(This also is a solution proposed in this Inkscape forum thread eventually found by the OP, which is to save the PDF with high-resolution bitmaps embedded. File -> Inkscape Preferences -> Bitmaps -> Resolution for Create Bitmap Copy, and set it to 6000 dpi)
The fact that interpolation looks different in different viewers and different output media, is by design. The PDF specification states on interpolation:
A conforming Reader may choose to not implement this feature of PDF, or may use any specific implementation of interpolation that it wishes.
A different way to get around this problem (especially as some PDF viewers have the tendency to not really live up to the specification and e.g. interpolate ignoring that flag) would be to use vector graphics here, drawing the bitmap pixels as rectangles. The result should be optimal.

PDF "Canonicalization"

I am writing a library to generate PDF reports using prawn reports.
One of the features I wish to my gem is the ability to provide means of testing the generation of reports.
The problem is that two visually equal PDFs can have different files.
Is there a way to make sure that 2 visually equal PDF have the same bits in the file? Something like XML canonicalization.
'Visual equality' (or visual similarity': where only a small percentage of pixels is different for each page) of 2 different PDFs can occur even if the internal structure of PDF objects is very different. (Think of a page of 'text', which may use real fonts or which may use 'outline' vector graphics for each glyph's shape...)
That means this equality can only be determined by rendering the two files at the same resolution to page images and then comparing both image sets pixel by pixel. The result of the comparison could be another pixel image that shows all differing pixels as red, or, at your preference, just the number of pixels which do not agree.
A scriptable way to do this with the help of ghostscript, pdftk and ImageMagick I've described in this answer:
How to unit test a Python function that draws PDF graphics?
Alternatively, you may have a look at
diffpdf
(which is available for Linux, Unix, Mac OS X and Windows): it also can compare two PDF files visually.
[ Your literal question was this: "Is there a way to make sure that 2 visually equal PDF have the same bits in the file?" -- However, I'm not sure if you really meant it that way -- hence my above answer. Otherwise I'd have to say: If two PDF files are visually equal, just generate their respective MD5sum to determine if they have the same bits in each file... ]

what sizes are valid for true type fonts?

I am trying to use true type fonts, but finding that when I load different ones, some do not display at various font sizes while others do. It seems, although I am not certain, that some simply do not show below a certain font size. Is there any way to easily tell what sizes are valid for a ttf by examining the file and not simply trying it out in my app?
UPDATE: Considering smilingthax's answer, it would seem like these must be bitmap glyphs since the smaller sizes are not showing up (but using some system font instead I think). Does someone know how I can determine the valid sizes I can use with them using the iOS sdk (iPad)?
TTFs can contain Bitmap glyphs and/or Outline glyphs. Outline glyphs are scalable to any size (e.g. "31.4592 pt") although they are often optimized (hinted) for certain sizes (8,10,12,16,....72). You can get these sizes via your Font API (Win32, Freetype, ...). Bitmap glyphs are only available in certain sizes.
For example Freetype's FTFace object contains a .face_flags member which can have FT_FACE_FLAG_FIXED_SIZES set. Also the sizes are accessible via .num_fixed_sizes and the .available_sizes array.

PDF Colo(u)r Analysis (without Acrobat itself ?)

Is there a library/tool which would list all colours used in a PDF document ?
I'm sure Acrobat itself would do this but I would like an alternative (ideally something that could be scripted).
So the idea is if you have a very simple PDF document with four colours in it the output might say :
RGB(100,0,0)
RGB(105,0,0)
CMYK(0,0,0,1)
CMYK(1,1,1,1)
You could explore the insides with pdfbox, but you would have to write some code to find and catalog all those colors.
Most PDF tools have access to this information but no api to access it. You could take any tool and add it in
Apago PDFspy generates an XML file containing all kinds of metadata extracted from PDF files. It reports color usage including spot colors.
We recently added a function called GetPageColorSpaces(0) to the Quick PDF Library - www.quickpdflibrary.com to retrieve much of the ColorSpace info used in the document.
Here is some sample output.
Resource,\"QuickPDFCS2eb0f578\",Separation,\"HKS 52 E\",DeviceCMYK,0.95,0,0.55,0
Resource,\"QuickPDFCSb7b05308\",Separation,\"Black\",DeviceCMYK,0,0,0,1
Resource,\"QuickPDFCSd9f10810\",Separation,\"Pantone 117 C\",DeviceCMYK,0,0.18,1,0.15
Resource,\"QuickPDFCS9314518c\",Separation,\"All\",DeviceCMYK,0,1,0,0.5
Resource,\"QuickPDFCS333d463d\",Separation,\"noplate\",DeviceCMYK,1,0,0,0
Resource,\"QuickPDFCSb41cafc4\",Separation,\"noprint\",DeviceCMYK,0,1,0,0
Resource,\"Cs10\",DeviceN,Black,Colorant,-1,-1,-1,-1
Resource,\"Cs10\",DeviceN,P1495,Colorant,-1,-1,-1,-1
Resource,\"Cs10\",DeviceN,CalRGB,Colorant,-1,-1,-1,-1
Resource,\"Cs10\",Separation,\"P1495\",DeviceCMYK,0,0.31,0.69,0
XObject,\"R29\",Image,,DeviceRGB,-1,-1,-1,-1
Disclaimer: I work at Atalasoft.
Our product, DotImage with the PDF Reader add-on, can do this. The easiest way is to rasterize the page and then just use any of our image analysis tools to get the colors.
This example shows how to do it if you want to group similar colors -- the deployed example will only work for PNG and JPEG, but if you download the code, it's trivial to include the add-on and get PDF as well (let me know if you need help)
Source here:
http://www.atalasoft.com/cs/blogs/31appsin31days/archive/2008/05/30/color-scheme-generator.aspx
Run it here:
http://www.atalasoft.com/31apps/ColorSchemeGenerator
If you are working with specific and simple PDF documents from a constrained source then you may be able to find the colors by reading through the content stream. However this cannot be a generic solution.
For example PDF documents can contain gradients or transparency. If your document contains this type of construct then you are likely to end up with a wide range of colors rather than a specific set.
Similarly many PDF documents contain bitmapped images. Given that these will need to be interpolated to be displayed at different resolutions, the set of colors in a displayed PDF may be bigger or different to (though obviously broadly similar to) the embedded bitmap.
Similarly many PDF documents contain constructs in multiple color spaces that are rendered into different color spaces. For example a PDF might contain a DeviceRGB bitmap, a line in an ICC based CMYK color and a Lab based rectangle. The displayed version might be in sRGB for display or CMYK for print. Each of these will influence the precise set of colors.
So the only 100% valid answer is going to be related to a particular render of a PDF at a particular resolution to a particular color space. From the resultant bitmap you can determine the colors that have been used.
There are a variety of PDF libraries that will do this type of render including DotImage (referenced in another answer) and ABCpdf .NET (on which I work).

Is there a way to use custom fonts in a PDF file?

Well basically I'm finishing school in mid December so I'm just brushing up my resume and I'm wondering if there's a way to use custom fonts (in this case Calibri and Cambria) in a PDF file and make them render correctly on all computers.
Thanks in advance!
EDIT: I'm using MS Word 2007, but am open to suggestions
PDFs don't store text and fonts like other documents, they actually convert the font to vectors, that way no matter what font you use, the document displays exactly as expected. This is why searching for text inside the PDF is such a problem for 3rd party PDF Readers and why even Adobe themselves use to distribute 2 versions of Acrobat (one with text search, one without).
Another thing to keep in mind is, PDF isn't pixel exact, it's ratio exact. PDF readers generally do not use a 100% zoom level, instead most people read them at "fit to screen" or "fit to page". I point this out because I'm guessing the reason you are trying to use those new Vista/Office 2007 fonts is because of their LCD subpixel support (improves readability on LCD screens). This feature will not translate into the PDF, since the letter becomes a vector, subpixel information is lost, and even if it wasn't, becomes useless because the vector will be sized to something other than you intended at view time.
The PDF format is capable of embedding fonts, if the font has been marked embeddable by its creator. You'll have to check the software that's creating your PDF to see if it has the capability and how to enable it.
theoretically speaking, on technical side, embedding/not embedding ability, regarding the fonts, is settled with a special flag in font file (ttf or opentype or type1)
you can view this special embedding flag with any font editor program (I recommend
FontCreator (by High-logic)
http://www.high-logic.com/font-editor/fontcreator.html
with a free trial fully operative and without limitations
you can also change embedding/not embedding flag, but legally speaking, for the 99% of fonts commercially distributed, this breaks the license of font