How to add text to an existing pdf at a fixed position? - pdf

I must insert a number at a fix position in an existing A4 pdf.
I've tried the following as a first test, but that doesn't work(not text is added).
What goes wrong?
Here's my code:
byte[] omrMarks = omrFrame.getOmrImage();
Jpeg img = new Jpeg(omrMarks);
PdfImportedPage page = stamper.getImportedPage(source, pageNum);
PdfContentByte pageContent = stamper.getOverContent(pageNum);
pageContent.addImage(
img, img.getWidth(), 0, 0, img.getHeight(), 15f, (page.getHeight() - 312));
pageContent.moveTo(10, 200);
pageContent.beginText();
pageContent.setLiteral("Test");
pageContent.endText();

There are many issues with this question.
This is certainly wrong:
pageContent.moveTo(10, 200);
pageContent.beginText();
pageContent.setLiteral("Test");
pageContent.endText();
The moveTo() method doesn't make sense; it has no effect on the text state object.
The text state object is illegal because there's no setFontAndSize() (it's very odd that this doesn't throw a RuntimeException, are you using an obsolete version of iText?)
The setLiteral() method should only be used to add some literal PDF syntax to a content stream.
For instance, something like:
pageContent.setLiteral("\n100 100 m\n100 200 l\nS\n");
should only be used if you understand that the following PDF syntax draws a line:
100 100 m
100 200 l
S
It's clear from your question that you don't understand PDF syntax, so you shouldn't use these methods. Instead you should use convenience methods such as the showTextAligned() method, which hide the complexity of PDF and save you a couple of lines.
Maybe you have a good reason to opt for the "hard way", but in that case, you should read the documentation, otherwise you'll continue using methods such as setLiteral() instead of showText(), moveTo() instead of moveText(), and so on, resulting in code you don't want your employer to see.
Furthermore, you're making the assumption that the lower left corner of the page has the coordinates (0,0). That's probably true for the majority of PDF documents found in the wild, but that's not true for all PDF documents. The MediaBox doesn't have to be [0 0 595 842], it could as well be [595 842 1190 1684]. Moreover: what if there's a CropBox? Maybe you're adding content that isn't visible because it's cropped away...

Related

cannot access width/height properties of PImage object in setup()

I'm working with the PImage class. Normally I make 2 PImage objects, load an image into one of them (my input picture) and create a blank image using createImage(), which will become the output. I then use the loadPixels() method to access the data on the input, do some manipulation then set the respective output pixel to the result. I have not had any trouble with this so far.
The dimensions of the input and the output PImage objects need to be the same to make the pixel-by-pixel manipulations as straight forward as possible.
So here is the pickle:
PImage myinput;
PImage myoutput;
void setup() {
size(350, 350);
myinput = loadImage("myfile.jpg");
// the pic is 300 x 300
//myoutput = createImage(myinput.width, myinput.height, RGB);
//I've hardcoded the width and height below
myoutput = createImage(300, 300, RGB);
}
void draw() {
image(myoutput, 0, 0);
}
The result of the above is a black square 300 x 300 which overlaps a grey canvas of 350 x 350. Given the code I've written, this is the result I would expect.
Now, in the above example, I've hardcoded the width and height of 'myoutput' with the line:
myoutput = createImage(300, 300, RGB);
My question relates to the bit that follows:
Instead of hardcoding the values, I would rather do something like this:
myoutput = createImage(myinput.width, myinput.height, RGB);
But it isn't working. I just get a big 350 x 350 grey box. And I'm not sure why. Though I do have my suspicions. When I work with pictures in javascript, I've got wait for the page to load (using an event listener like window.onload() {} etc.) before I can access the width/height properties of the image.
UPDATE:
I saw another post which had the following:
/* #pjs preload="myfile.jpg"; */
So I just included this before I declared my PImage objects and now the following line works.
myoutput = createImage(myinput.width, myinput.height, RGB);
I'm quite confused by the new piece of code.
When you run your sketch in Java mode, you're running as Java. Java loads images synchronously, which means that the code won't continue running until the image is fully loaded. That's why it works in Java mode.
But when you're running using Processing.js, you're running as JavaScript. JavaScript loads images asynchronously, which means that the image is loaded in the background while your code continues. That means you aren't guaranteed that the image is done loading when the next line executes, which is why the image's width and height are unset.
The preload command tells Processing.js to load the images before the sketch starts executing, so that you're guaranteed that the image loads before you try to access its width and height.
From the Processing.js reference:
This directive regulates image preloading, which is required when using loadImage() or requestImage() in a sketch. Using this directive will preload all images indicated between quotes, and comma separated if multiple images are used, so that they will be ready for use when the sketch begins running. As resources are loaded via the AJAX approach, not using this directive will result in the sketch loading an image, and then immediately trying to use this image in some way, even though the browser has not finished downloading and caching it.

Detecting Headers and Borders in PDF Tables using PDF Clown

I am using PDF Clown's TextInfoExtractionSample to extract a PDF table into Excel and I was able to do it except merged cells. In the below code, for object, "content" I see the scanned content as text, XObject, ContainerObject but nothing for borders. Anyone know what object represents borders in PDF table OR how to detect if a text is a header of the table?
private void Extract(ContentScanner level, PrimitiveComposer composer)
{
if(level == null)
return;
while(level.MoveNext())
{
ContentObject content = level.Current;
}
}
I am using PDF Clown's TextInfoExtractionSample...
In the below code, for object, "content" I see the scanned content as text, XObject, ContainerObject but nothing for borders.
while(level.MoveNext())
{
ContentObject content = level.Current;
}
A) Visit all content
In your loop code you removed very important blocks from the original example,
if(content is XObject)
{
// Scan the external level!
Extract(((XObject)content).GetScanner(level), composer);
}
and
if(content is ContainerObject)
{
// Scan the inner level!
Extract(level.ChildLevel, composer);
}
These blocks make the sample recurse into complex objects (the XObject, ContainerObject you mention) which in turn contain their own simple content.
B) Inspect all content
Anyone know what object represents borders in PDF table
Unfortunately there is nothing like a border attribute in PDF content. Instead, borders are independent objects, usually vector graphics, either lines or very thin rectangles.
Thus, while scanning the page content (recursively, as indicated in A) you will have to look for Path instances (namespace org.pdfclown.documents.contents.objects) containing
moveTo m, lineTo l, and stroke S operations or
rectangle re and fill f operations.
(This answer may help)
When you come across such lines, you will have to interpret them. These lines may be borders, but they may also be used as underlines, page decorations, ...
If the PDF happens to be tagged, things may be a bit easier insofar as you have to interpret less. Instead you can read the tagging information which may tell you where a cell starts and ends, so you do not need to interpret graphical lines. Unfortunately still less PDFs are tagged than not.
OR how to detect if a text is a header of the table?
Just as above, unless you happen to inspect a tagged PDF, there is nothing immediately telling you some text is a table header. You have to interpret again. Is that text outside of lines you determined to form a table? Is it inside at the top? Or just anywhere inside? Is it drawn in a specific font? Or larger? Different color? Etc.

iText throws ClassCastException: PdfNumber cannot be cast to PdfLiteral

I am using iText v5.5.1 to read PDF and render paint text from it:
pdfReader = new PdfReader(new CloseShieldInputStream(is));
pdfParser = new PdfReaderContentParser(pdfReader);
int maxPageNumber = pdfReader.getNumberOfPages();
int pageNumber = 1;
StringBuilder sb = new StringBuilder();
SimpleTextExtractionStrategy extractionStrategy = new SimpleTextExtractionStrategy();
while (pageNumber <= maxPageNumber) {
pdfParser.processContent(pageNumber, extractionStrategy);
sb.append(extractionStrategy.getText());
pageNumber++;
}
On one PDF file the following exception is thrown:
java.lang.ClassCastException: com.itextpdf.text.pdf.PdfNumber cannot be cast to com.itextpdf.text.pdf.PdfLiteral
at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.processContent(PdfContentStreamProcessor.java:382)
at com.itextpdf.text.pdf.parser.PdfReaderContentParser.processContent(PdfReaderContentParser.java:80)
That PDF file seems to be broken, but maybe its contents still makes sense...
Indeed
That PDF file seems to be broken
The content streams of all pages look like this:
/GS1 gs
q
595.00 0 0
It looks like they all are cut off early as the last line is not a complete operation. This certainly can make a parser hickup as iText does.
Furthermore the content should be longer because even the size of their compressed stream is a bit larger than the length of this. This indicates streams broken on the byte level.
Looking at the bytes of the PDF file one cannot help but notice that
even inside binary streams the codes 13 and 10 only occur together and
cross-reference offset values are less than the actual positions.
So I assume that this PDF has been transmitted using a transport method handling it as textual data, especially replacing any kind of assumed line break (CR or LF or CR LF) with the CR LF now omnipresent in the file (CR = Carriage Return = 13; LF = Line Feed = 10). Such replacements will automatically break any compressed data stream like the content streams in your file.
Unfortunately, though...
but maybe its contents still makes sense
Not much. There is one big image associated to each page respectively. Considering the small size of the content streams and the large image size I would assume that the PDF only contains scanned pages. But the images also are broken due to the replacements mentioned above.
This isn't the best solution, but I had this exact problem and unfortunately can't share the exact PDFs I was having issues with.
I made a fork of itextpdf that catches the ClassCastException and just skips PdfObjects that it takes issue with. It prints to System.out what the text contained and what type itextpdf thinks it was. I haven't been able to map this out to some systemic problem with my PDFs (someone smarter than me will need to do that), and this exception only happens once in a blue moon. Anyway, in case it helps anyone, this fork at least doesn't crash your code, lets you parse the majority of your PDFs, and gives you a bit of info on what types of bytestrings seem to give itextpdf indigestion.
https://github.com/njhwang/itextpdf

Create pdf with tooltips in R

Simple question: Is there a way to plot a graph from R in a pdf file and include tooltips?
Simple question: Is there a way to plot a graph from R in a pdf file and include tooltips?
There's always a way. But the devil is in the details, so the real question is: are you willing to get your hands dirty?
PDF does support tooltips for certain kinds of objects such as hyperlinks. So, if there is a way to insert raw PDF statements indicating there should be a hyperlink-like object at some position in your plot, then there is a way to pop up a tooltip.
Now, the only way I know of to generate and compile raw PDF statements is to create the document using TeX. There are definitely other ways to do it, but this is the one I am familiar with. The following examples use a graphics device that Cameron Bracken and I wrote, the tikzDevice, to render R graphics to LaTeX code. The tikzDevice has preliminary support for injecting arbitrary LaTeX code into the graphics output stream through the tikzAnnotate function---we will be using this to drop PDF callouts into the plot.
The steps involved are:
Set up a LaTeX macro to generate the PDF commands required to produce a callout.
Set up an R function that uses tikzAnnotate to invoke the LaTeX macro at specific points in the plot.
???
Profit!
In the examples that follow, one major caveat is attached to step 2. The coordinate calculations used will only work with base graphics, not grid graphics such as ggplot2.
Simple Tooltips
Step 1
The tikzDevice allows you to create R graphics that include the execution of arbitrary LaTeX commands. Usually this is done to insert things like $\alpha$ into plot titles to generate greek letters, α, or complex equations. Here we are going to use this feature to invoke some raw PDF voodoo.
Any LaTeX macros that you wish to be available during the generation of a tikzDevice graphic need to be defined up-front by setting the tikzLatexPackages option. Here we are going to append some stuff to that declaration:
require(tikzDevice) # So that default options are set
options(tikzLatexPackages = c(
getOption('tikzLatexPackages'), # The original contents: required stuff
# Avert your eyes for a sec, all will be explained below
"\\def\\tooltiptarget{\\phantom{\\rule{1mm}{1mm}}}",
"\\newbox\\tempboxa\\setbox\\tempboxa=\\hbox{}\\immediate\\pdfxform\\tempboxa \\edef\\emptyicon{\\the\\pdflastxform}",
"\\newcommand\\tooltip[1]{\\pdfstartlink user{/Subtype /Text/Contents (#1)/AP <</N \\emptyicon\\space 0 R >>}\\tooltiptarget\\pdfendlink}"
))
If all that quoted nonsense were to be written out as LaTeX code by someone who cared about readability, it would look like this:
\def\tooltiptarget{\phantom{\rule{1mm}{1mm}}}
\newbox\tempboxa
\setbox\tempboxa=\hbox{}
\immediate\pdfxform\tempboxa
\edef\emptyicon{\the\pdflastxform}
\newcommand\tooltip[1]{%
\pdfstartlink user{%
/Subtype /Text
/Contents (#1)
/AP <<
/N \emptyicon\space 0 R
>>
}%
\tooltiptarget%
\pdfendlink%
}
For those programmers who have never taken a walk on the wild side and done some "programming" in TeX, here's a blow-by-blow for the above code (as I understand it anyway, TeX can get very weird---especially when you get down in the trenches with it):
Line 1: Define an object, tooltiptarget, which is non-visible (a phantom) and is a 1mm x 1mm rectangle (a rule). This will be the onscreen area which we will use to detect mouseovers.
Line 2: Create a new box, which is like a "page fragment" of typset material. Can contain pretty much anything, including other boxes (sort of like an R list). Call it tempboxa.
Line 3: Assign the contents of tempboxa to contain an empty box that arranges its contents using a horizontal layout (which is unimportant, could have used a vbox or other box).
Line 4: Create a PDF Form XObject using the contents of tempboxa. A Form XObject can be used by PDF files to store graphics, like logos, that may be used over and over. Here we are creating a "blank icon" that we can use later to cut down on visual clutter. TeX defers output operations, like writing objects to a PDF file, until certain conditions have been met---such as a page has filled up. Immediate makes sure this operation is not deferred.
Line 5: This line captures an integer value that serves as a reference to the PDF XObject we just created and assigns it the name emptyicon.
Line 6: Starts the definition of a new macro called tooltip that takes 1 argument which is referred to in the body of the macro as #1. Each line in the macro ends with a comment character, %, to keep TeX from noticing the newlines that have been added for readability (newlines can have strange effects inside macros).
Line 7: Output raw PDF commands (pdfstartlink). This begins the creation of a new PDF annotation object (\Type \Annot) of which there are about 20 different subtypes---among them are hyperlinks. Every line following this contains raw PDF markup with a couple of TeX macros.
Line 8: Declare the annotation subtype we are going to use. Here I am going with a plain Text annotation such as a comment or sticky note.
Line 9: Declare the contents of the annotation. This will be the contents of our tooltip and is set to #1, the argument to the tooltip macro.
Lines 10-12: Normally text annotations are marked by an icon, such as a sticky note, to highlight their location in the text. This behavior will cause a visual mess if we allow it to splatter little sticky notes all over our graphs. Here we use an appearance array (\AP << >>) set the "normal" annotation icon (\N) to be the blank icon we created earlier. The integer we stored in emptyicon along with 0 R forms a reference to the Form XObject we made on Line 4 using an empty box.
Line 14: If we were making a normal hyperlink, here is where the text/image/whatever would go that would serve as the link body. Instead we insert tooltiptarget, our invisible phantom object which does not show up on the screen but does react to mouseovers.
Step 2
Allright, now that we have told LaTeX how to create tooltips, it is time to make them usable from R. This involves writing a function that will take coordinates on our graph, such as (1,1), and convert them into canvas or "device" coordinates. In the case of the tikzDevice the required measurement is "TeX points" (1/72.27 of an inch) from the absolute bottom left of the plotting area. Fortunately for base graphics, there are handy functions to calculate this for us. Grid graphics work differently, so the approach taken in the examples here won't work for them.
The final task for our R function is to call tikzAnnotate to insert a TikZ "node" into the output stream that is located at the coordinates we computed. Nodes can contain arbitrary TeX commands, in this case we will be calling upon tooltip.
Here is an R function that contains the above functionality:
place_PDF_tooltip <- function(x, y, text){
# Calculate coordinates
tikzX <- round(grconvertX(x, to = "device"), 2)
tikzY <- round(grconvertY(y, to = "device"), 2)
# Insert node
tikzAnnotate(paste(
"\\node at (", tikzX, ",", tikzY, ") ",
"{\\tooltip{", text, "}};",
sep = ''
))
invisible()
}
Step 3
Try it out on a plot:
# standAlone creates a complete LaTeX document. Default output makes
# stripped-down graphs ment for inclusion in other LaTeX documents
tikz('tooltips_ahoy.tex', standAlone = TRUE)
plot(1,1)
place_PDF_tooltip(1,1, 'Hello, World!')
dev.off()
require(tools)
texi2dvi('tooltips_ahoy.tex', pdf = TRUE)
Step 4
Behold the result (download a pdf):
Advanced Tooltips
Step 1
So, now that we have simple tooltips out of the way, why not crank it to 11? In the previous example, we used an empty hbox to get rid of the tooltip icon. But what if we had put something in that box, like text or a drawing? And what if there was a way to make it so that the icon only appeared during mouseover events?
The following TeX macro is a little rough around the edges, but it shows that this is possible:
\usetikzlibrary{shapes.callouts}
\tikzset{tooltip/.style = {
rectangle callout,
draw,
callout absolute pointer = {(-2em, 1em)}
}}
\def\tooltiptarget{\phantom{\rule{1mm}{1mm}}}
\newbox\tempboxa
\newcommand\tooltip[1]{%
\def\tooltipcallout{\tikz{\node[tooltip]{#1};}}%
\setbox\tempboxa=\hbox{\phantom{\tooltipcallout}}%
\immediate\pdfxform\tempboxa%
\edef\emptyicon{\the\pdflastxform}%
\setbox\tempboxa=\hbox{\tooltipcallout}%
\immediate\pdfxform\tempboxa%
\edef\tooltipicon{\the\pdflastxform}%
\pdfstartlink user{%
/Subtype /Text
/Contents (#1)
/AP <<
/N \emptyicon\space 0 R
/R \tooltipicon\space 0 R
>>
}%
\tooltiptarget%
\pdfendlink%
}
The following modifications have been made compared to the simple callout.
The shapes.callouts library is loaded which contains templates for TikZ to use when drawing callout boxes.
A tooltip style is defined which contains some TikZ graphics boilerplate. It specifies a rectangular callout box that is to be visible (draw). The callout absolute pointer business is a hack because I've had too many beers by this point to figure out how to place annotation icons using dynamically generated PDF primitives. This relies on the default anchoring of icons at their upper left corner and so pulls the pointer of the callout box toward that location. The result is that the boxes will always appear to the lower right of the pointer and if the callout text is long enough, they won't look right.
Inside the macro, the tooltip is generated using a one-shot tikz command that is stuffed into the tooltipcallout macro. A form XObject is generated from tooltipcallout and assigned to tooltipicon.
emptyicon is also dynamically generated by evaluating tooltipcallout inside of phantom. This is required because the size of the default icon apparently sets the viewport available for the rollover icon.
When generating PDF commands, a new row is added to the /AP array, /R for rollover, that uses the XObject referenced by tooltipicon.
The ready to consume R version is:
require(tikzDevice)
options(tikzLatexPackages = c(
getOption('tikzLatexPackages'),
"\\usetikzlibrary{shapes.callouts}",
"\\tikzset{tooltip/.style = {rectangle callout,draw,callout absolute pointer = {(-2em, 1em)}}}",
"\\def\\tooltiptarget{\\phantom{\\rule{1mm}{1mm}}}",
"\\newbox\\tempboxa",
"\\newcommand\\tooltip[1]{\\def\\tooltipcallout{\\tikz{\\node[tooltip]{#1};}}\\setbox\\tempboxa=\\hbox{\\phantom{\\tooltipcallout}}\\immediate\\pdfxform\\tempboxa\\edef\\emptyicon{\\the\\pdflastxform}\\setbox\\tempboxa=\\hbox{\\tooltipcallout}\\immediate\\pdfxform\\tempboxa\\edef\\tooltipicon{\\the\\pdflastxform}\\pdfstartlink user{/Subtype /Text/Contents (#1)/AP <</N \\emptyicon\\space 0 R/R \\tooltipicon\\space 0 R>>}\\tooltiptarget\\pdfendlink}"
))
Step 2
The R-level code is unchanged.
Step 3
Let's try a slightly more complicated graph:
tikz('tooltips_with_callouts.tex', standAlone = TRUE)
x <- 1:10
y <- runif(10, 0, 10)
plot(x,y)
place_PDF_tooltip(x,y,x)
dev.off()
require(tools)
texi2dvi('tooltips_with_callouts.tex', pdf = TRUE)
Step 4
The result (download a PDF):
As you can see, there is an issue with both the tooltip and the callout being displayed. Setting \Contents () so that the tooltip has an empty string won't help anything. This can probably be solved by using a different annotation type, but I'm not going to spend any more time on it at the moment.
Caveats
Lots of TeX commands contain backslashes, you will need to double the backslashes when expressing things in R code.
Certain characters are special to TeX, such as _, ^, %, complete list here, you will need to ensure these are properly escaped when using the tikzDevice.
Even though PDF is supposed to be superior to HTML in that it has a consistant rendering across platforms, your mileage will vary significantly depending on which viewer is being used. The screenshots were taken in Acrobat 8 on OS X, Preview also did a passable job but did not render the rollover callout. On Linux, xpdf didn't render anything and okular showed a tooltip, but did not suppress the tooltip icon and displayed a stock icon that looked a little garish in the middle of a plot.
Alternative Implementations
cooltooltips and fancytooltips are LaTeX packages that provide tooltip functionality that could probably be used from the tikzDevice. Some ideas used in the above examples were taken from cooltooltips.
Concluding Thoughts
Was this worth it? Well, by the end of the day I did learn two things:
I am not a PDF expert and I had to search a couple mailing lists and digest some parts of a 700+ page specification to even answer the question "is this possible?" let alone come up with an implementation.
I am not a SVG expert and yet I know that tooltips in SVG is not a question of "is this possible?" but rather "how sexy do you want it to look?".
Given that observation, I say that it is getting to be time for PDF to ride off into the sunset and take it's 700 page spec with it. We need a new page description markup language and personally I'm hoping SVG Print will be complete enough to do the job. I would even accept Adobe Mars if the specification is accesible enough. Just give us an XML-based format that can exploit the full power of today's JavaScript libraries and then some real innovation can begin. A LuaTeX backend would be a nice bonus.
References
tikzDevice: A device for outputting R graphics as LaTeX code.
comp.text.tex: Source of much insight into esoteric TeX details. Posts by Herbert Voß and Mycroft provided implementation ideas.
The pdfTeX manual: Source of information concerning TeX commands that can generate raw PDF code, such as \pdfstartlink
TeX by Topic: Go-to manual for low-level TeX details such as how \immediate actually works.
The Adobe PDF Specification: The lair of details concerning PDF primitives.
The TikZ Manual: Quite possibly the finest example of open source documentation ever written.
Well, not pdf, but you could include tooltips (among ohers) in svg format with SVGAnnotation or RSVGTipsDevice package.
The pdf2 package works fine for me. (https://r-forge.r-project.org/R/?group_id=193). You can just include a 'popup' in the regular text command. It's not current as of 2.14, but I imagine he'll get round to it before too long.
text(x,y,'hard copy text',popup='tooltip text')

iText PDF: replace / transform colours

I'm using iText in Java to select a few pages out of a big PDF document and save as a new, smaller PDF. At the same time I'd like to change their colours.
For example, suppose my pages all use shades of grey, and I'd like to make it green. All the colours used are shades of gray. I'd like to replace each of those colours with a corresponding colour in green.
Mark Storer asks:
What exactly are you trying to accomplish?
Turn this... into this:
I have some documents, on which I'm already using iText to select a smaller set of pages from the document based on user input - cutting more than 100 pages down to about 5. At the same time I wish to produce green, blue, yellow, pink etc versions of them. Not every page is in grayscale, but all the ones that matter are, so I can force their colour space if need be.
Update:
Following Mark Storer's suggestion of blending modes, here's what I have:
val reader = new PdfReader(file.toURL)
val document = new Document
val writer = PdfWriter.getInstance(document, outputStream)
document.open()
/* draw a white background behind the page, so the
blend always has something to transform, otherwise
it just fills. */
val canvas = writer.getDirectContent
canvas.setColorFill(new CMYKColor(0.0f, 0.0f, 0.0f, 0.0f))
canvas.rectangle(10f, 0f, 100f, 100f)
canvas.fill
/* Put the imported page on top of that */
val page = writer.getImportedPage(reader, 1)
canvas.addTemplate(page, 0, 0)
/* Fill a box with colour and a blending mode */
canvas.setColorFill(new CMYKColor(0.6f,0.1f,0.0f,0.5f))
val gstate = new PdfGState
gstate.setBlendMode(PdfGState.BM_SCREEN)
canvas.setGState(gstate)
canvas.rectangle(0f, 0f, 100f, 100f)
canvas.fill
document.close()
(It's in Scala, but the iText library is just the same as in Java)
The problem is, all the blending modes iText has available are "separable" modes: they operate on each colour channel independently. That means I can separately adjust the cyan, magenta, yellow or black values, but I can't turn gray into green.
To do that, I'd need to use the Color blending mode, which is "non-separable", ie the colour channels affect each other. As far as I can tell, iText doesn't offer that - none of the non-separale blending modes are listed among the constants in PdfGState. I'm using iText 5.0.5, which is the latest version as of writing this.
Is there a way of accessing these blending modes in iText, or even of hacking them in? Is there another way of achieving the result?
This adobe document described blending modes.
Update:
Even setting the blend mode to Color didn't work. I did this in code to force it:
val gstate = new PdfGState
gstate.put(PdfName.BM, new PdfName("Color"))
canvas.setGState(gstate)
and I checked the resulting PDF in a text editor to make sure it said the right thing. Sadly the result on screen just didn't work. I've no idea why, according to the PDF specification that should be the right blend mode.
Mark Storer asks:
"Color" didn't work? Funky. Can we see the PDF?
Here's a PDF.
Putting it on the web, I can now see that the "Color" mode works correctly in Chrome, but doesn't work in Acrobat 9 Pro (CS4). So the technique is correct, but Adobe fails at rendering!
I wonder if there isn't some way of "flattening" the effect of the blending mode, so the PDF contains the desired colour object directly rather than a blending intended to result in the right colour.
Idea: Turn this upside down. Use the existing page as an alpha channel on a page filled
entirely with the desired color rather than the other way around.
How? I'm not sure the GState applies to adding a template?
Also, the imported page would need the white background adding first, or it will simply flood with colour wherever there isn't an object rather then blending.
I tried doing this:
val canvas = writer.getDirectContent
canvas.setColorFill(new CMYKColor(0.6f,0.1f,0.0f,0.0f))
canvas.rectangle(10f, 0f, 500f, 500f)
canvas.fill
val template = canvas.createTemplate(500f, 500f)
template.setColorFill(new CMYKColor(0f, 0f, 0f, 0f))
template.rectangle(0f, 0f, 500f, 500f)
template.fill
val page = writer.getImportedPage(reader, 1)
template.addTemplate(page, 0, 0)
val gstate = new PdfGState
gstate.put(PdfName.BM, new PdfName("Color"))
canvas.setGState(gstate)
canvas.addTemplate(template, 0, 0)
And here's the PDF it produced. Not quite right, either in Chrome or Acrobat :)
Edit: Silly me. I changed the mode to "Luminosity", producing this file. As before, this looks correct in Chrome but not in Acrobat.
I just checked, and even Adobe Reader X doesn't render it properly. Which probably means what I'm asking for is impossible. :(
Solution
Leonard Rosenthal from Adobe got back to me, and clarified the problem: the "Color" blend mode only works when the transformation space is RGB, not CMYK. My PDF wasn't specifying the space, so Adobe products default to CMYK, while others default to RGB.
The solution in iText was to add this line near the top:
writer.setRgbTransparencyBlending(true)
Of course, for the sake of colour accuracy you don't want any more colour space conversions than absolutely necessary, so only use this line if you really do need to use RGB blend modes.
The colour pages produced look a little strange from a Photoshop user's perspective: it looks like light greys have been made more saturate than dark greys. I'm investigating ways of combining filters to adjust that output.
Here's the result!
Many thanks to Mark Storer for helping me reach this solution.
If you consistently want to go from "shades of gray" to "shades of Color X", you might be able to use transparency with some funky blending mode.
If you want to go through all the content streams and edit the existing color commands, that's a pretty tall order. You have to take into account a Wide Variety of colo(u)r spaces. DeviceGray, DeviceRGB, DeviceCMYK, ICC profiles, calibrated RGB and CMYK, spot colors, and So Forth.
What exactly are you trying to accomplish?
"Color" didn't work? Funky. Can we see the PDF?
Idea: Turn this upside down. Use the existing page as an alpha channel on a page filled entirely with the desired color rather than the other way around.
One more try. Rather than blending, use a transfer function. You'd need to build a Function dictionary. You're sticking with CMYK, so cramming any and all inputs into a specific output should be fairly simple.
something like this:
C: [0 1] -> [0 0.6]
M: [0 1] -> [0 0.1]
Y: [0 1] -> [0 0]
K: [0 1] -> [0 0]
(I swiped the 0.6 0.1 0 0 from your PDF)
Urgh... only your existing page is all deviceGray, right? Nope... CMYK as well, only just K's. You'd need a transfer function that took the K values, and mapped them to CMYK based on the color output you wanted.
And then I looked at how you define a function in PDF. So much for simple. Domains and ranges and samples oh my! Not exactly trivial.
Still, this just might work.
(though I still think you should find a blended PDF that works in Acrobat and see what the differences are)
Last ditch effort:
PM Leonard Rosenthol. He has an account here on SO. He's the Acrobat developer relations guy for Adobe. Tell him that Mark Storer is stumped. That should get his attention. ;)