I need to detect the CMYK colour of some text in a pdf. Just like you can with Acrobat's Print Production/Output Preview feature, but via a vb.net program.
I've used Ghostcript's inkcov command in the past to do something similar and get the CMYK value of ink coverage in a specified area of a pdf, but I can't seem to find a way to list the individual different CMYK values this way.
I've tried using SpirePDF to get the co-ordinates of the text and then producing a bitmap from the area, and then looping through the pixels to get the RGB values, but I can't get a CMYK value this way. Trying to convert the RGB to CMYK doesn't produce a close enough match to the original colour, so I think I have to stick with working directly on the pdf.
Using Ghostcript to isolate the area I'm interested in, I can probably produce a pdf with just some text in the colour I want to find, plus the background colour, but how do I get the text colour from that stage?
Or is there an easier approach?
Related
Is it possible to tell ghostscript to remove white backgrounds when using the pdfwrite-device?
The reason for this is that the generated PDF should be overlaid in further processing over some letterhead also given as PDF.
If the source postscript has the background already set to white then the resulting PDF would also have an explicited white background (achieved by a rect at the beginning of each page set to the complete page size set and filled with non-stroking color 'white'). Thus the generated PDF cannot be overlayed over a second letterhead-PDF. The white background would override the letterhead completely and the letterhead won't appear in the final PDF.
The application generating the postscript output with white background (e.g. some business software) is fix and cannot be changed. So the changes have to be done when processing the postscript output of this software.
No, you cannot remove that with Ghostscript and the pdfwrite device.
If the problem is always produced by the same input the possibly you could write something in PostScript to solve the problem but without seeing an example I can't say for sure.
Note that PostScript doesn't have a 'non-stroking' colour, there's only one colour in PostScript.
Another solution it seems to me would be to change the Z order; put the letterhead on top of the content, rather than putting the content on top of the letter heading.
Can anyone tell why the image in this pdf does not display as 100% Cyan?
clrtestc - NOPREBLEND32.PDF
Warning: I probably know just enough about pdf and colour to be dangerous!
I'm pretty sure each colour plane of the image is in a separate image. Here's a blended version if that helps.
I know the ColorSpace is DeviceCMYK
I'm pretty sure there is only 100% Cyan in the image, at least there was when it went into the PDF converter.
What went in:
CMYK: 100,0,0,0
RGB: 0,255,255
What I measure coming out:
CMYK: 100,27,0,6
RGB: 0,173,238
I'm foxed! Is there some filter affecting the rendering of the PDF?
There's also Magenta, Yellow and Black versions if they help.
Any help much appreciated.
The PDF file is extraordinarily complicated, it has numerous Forms, some of them nested, most of which are empty. However there only appears to be one image, which is defined in an Indexed CMYK space. So as far as I can see, this is indeed a 100% cyan image.
The extended graphics state does use the Multiply Blend mode, and there is no group and no page group specified, so the colour space used for the blending will depend on the colour model of the output device. If that's a monitor, then it's entirely possible that the resulting output will be RGB.
That's because your CMYK image needs to be converted to RGB in order to be blended using that colour space.
Incidentally, the image is in an Indexed colour space. In your image all the image samples have the same value, that value is then consulted in a lookup table, and that table returns the CMYK components. So no, there is not one image per colour plane, or at least, not in this file.
To be honest, you're going to have to explain better how you are evaluating the content of the PDF file. As far as I can see the image is 100% cyan, and when rendered to a CMYK device, it will remain 100% cyan. If you render to an RGB device, it will be converted to RGB. A poor quality PDF consumer might decide to convert to RGB in the absence of a defined colour space for the blending operation.
Since the blending mode doesn't actually do anything (there's no defined alpha, SMask or any other transparency in the file) you could remove that and see if it sorts out your problem.
Edit
Your screen will be an RGB device, so no matter what the CMYK values in the PDF file are, there won't be any CMYK in the screenshot. The PDF rendering engine will have to convert the CMYK to RGB.
So the PDF rendering engine performs an opaque CMYK->RGB conversion. Then you take a picture of that RGB screen. You load that into an image editing application, and ask it what the RGB values are and presumably what it thinks are the CMYK equivalents.
If the CMYK->RGB calculation that the PDF viewer performs is not the inverse of the calculation that the RGB->CMYK image application performs, then you won't be getting the right values!
There's no way to predict what the RGB intermediate values 'should' be, because there is no 'right' answer here. Fundamentally this isn't a reliable technique for evaluating the colour.
It's hard to make any kind of recommendation without knowing what you are trying to achieve (and possibly why), and what tools you are prepared to use. I believe Acrobat Pro would allow you to look at the colour values directly for example. Or you could use something like Ghostscript to create a CMYK TIFF file, then open that in an image application which supports CMYK (like Photoshop) and look at the values there.
But rendering to the screen, taking a screenshot and trying to figure out what the CMYK values might or might not have been is not really going to work.
I'm trying to use dcraw on a color image (e.g.CR or NEF) to extract raw monochrome data for image processing.
With parameters -4 -D -c I get an image with a checkerboard as shown below:
When unzoomed, the image data is correct, except for the checkboard pattern in all images from different cameras.
The above image was produced using -T and zooming in the resulting .tiff file in File Viewer Plus. In practice, I'm reading the .pgm file directly and getting the same checkboard.
What aren't I understanding? Does this have something to do with Bayer filtering?
Yes, this is due to Bayer filtering and no demosaicing. For example, Green areas will have green pixels brighter than red according to the Bayer pattern, whereas red areas will have green pixels dark.
To get some kind of correct grayscale (or color) image, intensity has to be weighed over a 2x2 area (in standard Bayer). What you are looking for cannot be achieved without the demosaicing step.
Your best bet is to extract a color image, then turn it into grayscale.
I have about 400 pdfs with a lot of dead space between the text and the page border.
Usually I'm using govert's pdf cropper to crop all the whitespace, but this time the pdf background color is (darn!) yellow,
and no software which I know (and I've searched for quite a while) can crop non-whitespace
(well, except maybe pdfcrop.pl -a Pearl library which supposedly can remove black spaces).
Anybody knows of a software that can perform such task?
The ideal app, I guess, would have the option to receive specific color to remove,
like rgb(192,192,192).
Thanks in advance.
The reason this is so difficult is that PDF has no concept of paper color or background color. So what you're seeing is not a different background color, but an object (typically a rectangle) painted in that yellow background color.
Most cropping tools simply calculate the bounding box of all objects on the page and then crop away everything outside that bounding box. Of course that doesn't work for your file because the bounding box will include the background rectangle object.
There are potentially a number of directions you could take this:
1) If all pages need to be cropped by the same amount, you could attempt to do cropping that way (simply passing a rectangle to the cropping tool to do the actual cropping).
2) There are tools (callas pdfToolbox - watch it, I'm associated with this tool, Enfocus PitStop...) that allow you to remove objects from a document and this could be done by specifying your yellow color. This would allow you to modify the PDF file by removing the background object and then perform the cropping you want to perform.
I am trying to convert a pdf into tif using ghost script. Is it possible to remove the background (grey color) of a text block (back font color) in a pdf using ghost script? I would like to replace the grey background to white.
Appreciate your help!!
I don't think you'll get a generic solution to your problem because there are many different ways such a background may be coded in your PDF and there is no sure way to distinguish such a background from a rectangular form of some vector image.
PDF essentially offers a set of tools for positioning glyphs and vector graphics in some rectangle (page) to display and some additional tools to add some interactivity (e.g. forms). Thus, a colored background in a PDF generally is created by drawing a line along the edge of the area of the background, fill this form with the desired color, and position glyphs and graphics (text and images) atop it. There are other operators, too, which can be used, though, and many variants of their use, and generally the form created is not marked as background.
In the answer Dingo refers to in his comment a rectangle covering the whole page, actually even a bit more (in case of a fairly common choice of a media box), is drawn (m: move to a corner; 4*l: draw the 4 edge lines; h: close the path; f fill the form).
Thus, please make the PDF in question available for inspection, maybe there is some specific solution for your file.