I am trying to extract the background image of a PDF page to an SVG (using xpdf library). The problem I am facing is that the PDF contains additional images/graphics (presumably outside the cropbox) that are not rendered by PDF readers, but the corresponding SVG contains these images/graphics. I tried setting the viewBox attribute of the SVG to correspond to the cropBox bounds of that PDF page but the resulting SVG still displays some of the graphics objects that are not rendered by PDF. I also tried adding a clip path to the SVG - a rectangular clipping region (with bounds corresponding to PDF cropbox), but this too did not eliminate some of the additional graphics elements no seen in PDF. Any idea on what could be the problem? What is the right way to carry over PDF cropbox to SVG? Btw, the SVGs generated in both the cases mentioned above (viewbox and clipping region approaches) were fairly close in dimensions to the viewable area of the PDF page, and the additional elements were seen only close to the edges. Is it that cropbox dimensions obtained from PDF should not be used directly in SVG?
Turns out that the problem was due to my code not transforming the PDF cropbox attribute (as given by xpdf) to user coordinates using CTM matrix (also obtainable through xpdf). After applying the transformation, the resulting SVG matches the rendered portion of the PDF page.
Related
I have scanned my copybook and want to crop out extra white regions with Inkscape.
To achieve this, I import initial image (PDF) to Inkscape, draw appropriate rectangle, and use Object->Clip->Set to cut out needed region. Then I resize page to drawing and save obtained page as new PDF file through File->Save a Copy.
I expected that the size of the new PDF file (with cropped image) will be less than the size of the initial PDF (with image without crop), but they are the same.
What is the reason of this and may it be worked around?
I use Inkscape 0.91 at Linux Mint 18.2.
Thank you in advance.
Because the original image is still there, fully intact and with all its contents. The cropping rectangle are just instructions to the PDF viewer to crop out those regions when rendering the image.
However in Inkscape you can bake the crop rectangles and when exporting to PDF "apply raster effects" which should actually alter the contained image(s).
Im rendering a pdf using pdf js library. There I can specify zoom (scale) property. Which is fine. I can define pretty high zoom , let's say 8x and still get decent quality of the rendered pdf. However if I were to try to same pdf but converted to graphic image format like jpeg. And then try to render it with high zoom the quality is very bad. Why is that so?
You are describing the difference between vector graphics and raster graphics. A vector graphic format contains contains commands telling how to draw an image. A raster format is an array that tells what the color is at each position in the image.
PDF is largely a raster format (Yes, you can embed a raster image in a PDF). A PDF that has in instruction to draw a line or draw a character can be zoomed to any degree and the drawing will be correct.
In a raster format, if you zoom, eventually you see the individual pixels in the array and they cannot be zoomed any more without distortion. Text in a JPEG or PNG file becomes jagged as you zoom.
On the other hand, try to create a photographic quality image just with drawing commands and you would get huge files.
This is related to my earlier post where I try to extract vector graphics in a PDF to an SVG file. I obtained this SVG corresponding to a page in a PDF. Unfortunately, though the shapes in the SVG are correct, the fill colors seen in the PDF are not seen when the SVG is rendered by any of the standard browsers. The upper left triangle in the SVG is supposed to have a yellow fill and the lower right triangle is expected to have a light blue-green fill. I do see 2 rectangular clip paths (ids clip_2 and clip_3) in the SVG XML having these colors, but why isn't SVG rendering displaying these colors?
From the spec:
The raw geometry of each child element exclusive of rendering
properties such as ‘fill’, ‘stroke’, ‘stroke-width’ within a
‘clipPath’ conceptually defines a 1-bit mask (with the possible
exception of anti-aliasing along the edge of the geometry) which
represents the silhouette of the graphics associated with that
element. Anything outside the outline of the object is masked out.
I have a PDF with a page with an image. I'm using a command line tool to extract this image. The page in the PDF shows only a part of the image, because the extracted image as a lot more "contents" and they are slightly rotated. This happens, I assume, because some sort of cropping and/or rotation was applied to the image when the PDF was built.
Is there anyway, using iText, to figure out the offset and rotation applied to the image? That would allow me to crop the extracted image in the same way and end up with something similar to what's visible on the PDF page.
I have an iPad app that displays pdf pages.I need to add annotations on the image (if exists on the pdf page) for which i need the coordinates at which the image is situated in the pdf page.I am able to get the image data from the XObject and the image width and height,but i also need the x and y coodrinate of the image.Any idea about how to obtain the coordinates of image by parsing pdf page?
Im assuming you have seen this apple developer page describing how to parse XObjects: http://developer.apple.com/library/mac/#documentation/GraphicsImaging/Conceptual/drawingwithquartz2d/dq_pdf_scan/dq_pdf_scan.html
XObjects do not contain any position data as they just describe image data that can be reused through the pdf.
From http://itext-general.2136553.n4.nabble.com/finding-the-position-of-xobject-in-an-existing-pdf-td2157152.html
"An XObject is a stream that can be reused in many different
other streams. For instance: you could have an image XObject
of a logo that appears on every page in the document.
Suppose that you have some pages in landscape and some in portrait.
Then the logo will have different coordinates on these different
pages. Therefore the position of the XObject IS NEVER STORED with
the XObject, the position can be found in the stream that refers
to the XObject.
Maybe your reaction is: "Oh right, then it's simple: I have to
look in the content stream of the pages using the XObject."
Yes and no. That's indeed where you should look, but it's not
simple. Because the actual position depends on the current
transformation matrix of the state at the moment the image is
added. It's quite some programming work to parse the content
stream and calculate the position of an XObject. "
I think you should find another option and avoid this all together.
If your still determined you will have to use CGPDFScanner and find the transforms through the page.