Zooming a picture vs zooming a pdf - pdf

Im rendering a pdf using pdf js library. There I can specify zoom (scale) property. Which is fine. I can define pretty high zoom , let's say 8x and still get decent quality of the rendered pdf. However if I were to try to same pdf but converted to graphic image format like jpeg. And then try to render it with high zoom the quality is very bad. Why is that so?

You are describing the difference between vector graphics and raster graphics. A vector graphic format contains contains commands telling how to draw an image. A raster format is an array that tells what the color is at each position in the image.
PDF is largely a raster format (Yes, you can embed a raster image in a PDF). A PDF that has in instruction to draw a line or draw a character can be zoomed to any degree and the drawing will be correct.
In a raster format, if you zoom, eventually you see the individual pixels in the array and they cannot be zoomed any more without distortion. Text in a JPEG or PNG file becomes jagged as you zoom.
On the other hand, try to create a photographic quality image just with drawing commands and you would get huge files.

Related

Why is the pure Cyan image in this PDF not displayed as pure Cyan?

Can anyone tell why the image in this pdf does not display as 100% Cyan?
clrtestc - NOPREBLEND32.PDF
Warning: I probably know just enough about pdf and colour to be dangerous!
I'm pretty sure each colour plane of the image is in a separate image. Here's a blended version if that helps.
I know the ColorSpace is DeviceCMYK
I'm pretty sure there is only 100% Cyan in the image, at least there was when it went into the PDF converter.
What went in:
CMYK: 100,0,0,0
RGB: 0,255,255
What I measure coming out:
CMYK: 100,27,0,6
RGB: 0,173,238
I'm foxed! Is there some filter affecting the rendering of the PDF?
There's also Magenta, Yellow and Black versions if they help.
Any help much appreciated.
The PDF file is extraordinarily complicated, it has numerous Forms, some of them nested, most of which are empty. However there only appears to be one image, which is defined in an Indexed CMYK space. So as far as I can see, this is indeed a 100% cyan image.
The extended graphics state does use the Multiply Blend mode, and there is no group and no page group specified, so the colour space used for the blending will depend on the colour model of the output device. If that's a monitor, then it's entirely possible that the resulting output will be RGB.
That's because your CMYK image needs to be converted to RGB in order to be blended using that colour space.
Incidentally, the image is in an Indexed colour space. In your image all the image samples have the same value, that value is then consulted in a lookup table, and that table returns the CMYK components. So no, there is not one image per colour plane, or at least, not in this file.
To be honest, you're going to have to explain better how you are evaluating the content of the PDF file. As far as I can see the image is 100% cyan, and when rendered to a CMYK device, it will remain 100% cyan. If you render to an RGB device, it will be converted to RGB. A poor quality PDF consumer might decide to convert to RGB in the absence of a defined colour space for the blending operation.
Since the blending mode doesn't actually do anything (there's no defined alpha, SMask or any other transparency in the file) you could remove that and see if it sorts out your problem.
Edit
Your screen will be an RGB device, so no matter what the CMYK values in the PDF file are, there won't be any CMYK in the screenshot. The PDF rendering engine will have to convert the CMYK to RGB.
So the PDF rendering engine performs an opaque CMYK->RGB conversion. Then you take a picture of that RGB screen. You load that into an image editing application, and ask it what the RGB values are and presumably what it thinks are the CMYK equivalents.
If the CMYK->RGB calculation that the PDF viewer performs is not the inverse of the calculation that the RGB->CMYK image application performs, then you won't be getting the right values!
There's no way to predict what the RGB intermediate values 'should' be, because there is no 'right' answer here. Fundamentally this isn't a reliable technique for evaluating the colour.
It's hard to make any kind of recommendation without knowing what you are trying to achieve (and possibly why), and what tools you are prepared to use. I believe Acrobat Pro would allow you to look at the colour values directly for example. Or you could use something like Ghostscript to create a CMYK TIFF file, then open that in an image application which supports CMYK (like Photoshop) and look at the values there.
But rendering to the screen, taking a screenshot and trying to figure out what the CMYK values might or might not have been is not really going to work.

Why the size of file with cropped image is the same as of initial one?

I have scanned my copybook and want to crop out extra white regions with Inkscape.
To achieve this, I import initial image (PDF) to Inkscape, draw appropriate rectangle, and use Object->Clip->Set to cut out needed region. Then I resize page to drawing and save obtained page as new PDF file through File->Save a Copy.
I expected that the size of the new PDF file (with cropped image) will be less than the size of the initial PDF (with image without crop), but they are the same.
What is the reason of this and may it be worked around?
I use Inkscape 0.91 at Linux Mint 18.2.
Thank you in advance.
Because the original image is still there, fully intact and with all its contents. The cropping rectangle are just instructions to the PDF viewer to crop out those regions when rendering the image.
However in Inkscape you can bake the crop rectangles and when exporting to PDF "apply raster effects" which should actually alter the contained image(s).

matplotlib changing bitmap color mapping

I'm using matplotlib to generate some composite figures (from raw data and images). I'm trying to get the script to take image files of a few file formats, which are then plotted via:
Nxy = mpimg.imread(Nxy_filename)
imgplot = ax1.imshow(Nxy)
where ax1 is the subplot I want the image to show up in. This works fine for both PNG and JPEG images, but for a .bmp (of the same image) matplotlib seems to turn it blue, i.e.
turns into:
in my composite figure. On the other hand, the png and jpg files look exactly the same as the original. Any idea why this would happen? I'm reluctant to blindly alter the color map in the code since the other image formats appear as expected.
It sounds like your PNG and JPEG images are RGB images that happen to be grey while the BMP image is grey scale. Check the shape of Nxy. My guess is it's two dimensional for the BMP while the PNG and JPEG image arrays have three dimensions.

Getting the cropping and rotation information of an image in a PDF

I have a PDF with a page with an image. I'm using a command line tool to extract this image. The page in the PDF shows only a part of the image, because the extracted image as a lot more "contents" and they are slightly rotated. This happens, I assume, because some sort of cropping and/or rotation was applied to the image when the PDF was built.
Is there anyway, using iText, to figure out the offset and rotation applied to the image? That would allow me to crop the extracted image in the same way and end up with something similar to what's visible on the PDF page.

resolution from a PDFPage?

I have a PDF document that is created by creating NSImages with size in 72dpi pts, each has a single representation which is measured in pixels. I then put these images into PDFPages with initWithImage, and then save the document.
When I open the document, I need the resolution of the original image. However, all of the rectangles that PDFPage gives me are measured in points, not pixels.
I know that the information is in there, and I suppose I can try to parse the PDF data myself, by going through the voyeur.app example... but that's a WHOLE lot of effort to do something that should be pretty normal...
Is there an easier way to do this?
Added:
I've tried two techniques:
get the PDFRepresentation data from
the page, and use it to make a new
NSImage via initWithData. This
works, however, the image has both
size and pixel size in 72dpi.
Draw the PDFPage into a new
off-screen context, and then get a
CGImage from that. The problem is
that when I'm making the context, it
appears that I need to know the size
in pixels already, which defeats
part of the purpose...
There are a few things you need to understand about PDF:
The PDF Coordinate system is in
points (1/72 inch) by default.
The PDF Coordinate system is devoid of resolution. (this is a white lie - the resolution is effectively the limits of 32 bit floating point numbers).
Images in PDF do not inherently have any resolution attached to them (this is a white lie - images compressed with JPEG2000 still have resolution in their embedded metadata).
An Image in PDF is represented by an object that contains a series of samples that are stored using some compression filter.
Image objects can be rendered on a page multiple times at any size.
Since resolution is defined as the number of pixels (or samples) per unit distance, resolution only means something for a particular rendering of an image on a page. So if you are rendering a particular image to fill the page, then the resolution in dpi is
xdpi = image_width / (pageWidthInPoints / 72.0);
ydpi = image_height / (pageHeightInPoints / 72.0);
If the image is not being rendered to the full size of the page, a complete solution is very tricky. Adobe prescribes that images should be treated as being 1x1 and that you change the page transformation matrix to determine how to render them. The means that you would need the matrix at the point of rendering the image and you would need to push the points (0,0), (0, 1), (1,0) through the matrix. The Euclidean distance between (0, 0)' and (1, 0)' will give you the width in points and the Euclidean distance between (0, 0)' and (0, 1)' will give you the height in points.
So how do you get that matrix? Well, you need the content stream for the page and you need to write a PDF interpreter that can rip the content stream and keep track of changes to the CTM. When you reach your image, you extract the CTM for it.
To do that last step should be about an hour with a decent PDF toolkit, provided you are familiar with the toolkit. Writing that toolkit is several person years of work.