iText throws ClassCastException: PdfNumber cannot be cast to PdfLiteral - pdf

I am using iText v5.5.1 to read PDF and render paint text from it:
pdfReader = new PdfReader(new CloseShieldInputStream(is));
pdfParser = new PdfReaderContentParser(pdfReader);
int maxPageNumber = pdfReader.getNumberOfPages();
int pageNumber = 1;
StringBuilder sb = new StringBuilder();
SimpleTextExtractionStrategy extractionStrategy = new SimpleTextExtractionStrategy();
while (pageNumber <= maxPageNumber) {
pdfParser.processContent(pageNumber, extractionStrategy);
sb.append(extractionStrategy.getText());
pageNumber++;
}
On one PDF file the following exception is thrown:
java.lang.ClassCastException: com.itextpdf.text.pdf.PdfNumber cannot be cast to com.itextpdf.text.pdf.PdfLiteral
at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.processContent(PdfContentStreamProcessor.java:382)
at com.itextpdf.text.pdf.parser.PdfReaderContentParser.processContent(PdfReaderContentParser.java:80)
That PDF file seems to be broken, but maybe its contents still makes sense...

Indeed
That PDF file seems to be broken
The content streams of all pages look like this:
/GS1 gs
q
595.00 0 0
It looks like they all are cut off early as the last line is not a complete operation. This certainly can make a parser hickup as iText does.
Furthermore the content should be longer because even the size of their compressed stream is a bit larger than the length of this. This indicates streams broken on the byte level.
Looking at the bytes of the PDF file one cannot help but notice that
even inside binary streams the codes 13 and 10 only occur together and
cross-reference offset values are less than the actual positions.
So I assume that this PDF has been transmitted using a transport method handling it as textual data, especially replacing any kind of assumed line break (CR or LF or CR LF) with the CR LF now omnipresent in the file (CR = Carriage Return = 13; LF = Line Feed = 10). Such replacements will automatically break any compressed data stream like the content streams in your file.
Unfortunately, though...
but maybe its contents still makes sense
Not much. There is one big image associated to each page respectively. Considering the small size of the content streams and the large image size I would assume that the PDF only contains scanned pages. But the images also are broken due to the replacements mentioned above.

This isn't the best solution, but I had this exact problem and unfortunately can't share the exact PDFs I was having issues with.
I made a fork of itextpdf that catches the ClassCastException and just skips PdfObjects that it takes issue with. It prints to System.out what the text contained and what type itextpdf thinks it was. I haven't been able to map this out to some systemic problem with my PDFs (someone smarter than me will need to do that), and this exception only happens once in a blue moon. Anyway, in case it helps anyone, this fork at least doesn't crash your code, lets you parse the majority of your PDFs, and gives you a bit of info on what types of bytestrings seem to give itextpdf indigestion.
https://github.com/njhwang/itextpdf

Related

how to get pdf origin contents using itext

I will make the problem concrete. I currently have three PDFs
The first PDF is a pure PDF without any signature. The link is as follows,
https://drive.google.com/file/d/14gPZaL2AClRlPb5R2FQob4BBw31vvqYk/view?usp=sharing
The second PDF, I digitally signed the first PDF using adobe_acrobat_dc, the link is here,
https://drive.google.com/file/d/1CSrWV7SKrWUAJAf2uhwRZ8ephGa_uYYs/view?usp=sharing,
The third PDF is generated like this, I used the code you once provided as below
com.itextpdf.kernel.pdf.PdfReader pdfReader = new com.itextpdf.kernel.pdf.PdfReader(new
FileInputStream("C:\\Users\\Dell\\Desktop\\test2.pdf"));
com.itextpdf.kernel.pdf.PdfDocument pdfDocument = new com.itextpdf.kernel.pdf.PdfDocument(pdfReader);
SignatureUtil signatureUtil = new SignatureUtil((pdfDocument));
for(String name: signatureUtil.getSignatureNames()){
System.out.println(name);
PdfSignature signature = signatureUtil.getSignature(name);
PdfArray b = signature.getByteRange();
long[] longs = b.asLongArray();
RandomAccessFileOrArray rf = pdfReader.getSafeFile();
try (InputStream rg = new RASInputStream(new RandomAccessSourceFactory().createRanged(rf.createSourceView(),longs));
ByteArrayOutputStream byteArrayOutputStream = new com.itextpdf.io.source.ByteArrayOutputStream();) {
byte[] buf = new byte[8192];
int rd;
while ((rd = rg.read(buf, 0, buf.length)) > 0) {
byteArrayOutputStream.write(buf, 0, rd);
}
byte[] bytes1 = byteArrayOutputStream.toByteArray();
String s2 = DatatypeConverter.printBase64Binary(bytes1);
}
}
Process the second PDF to get the base64 encoded form of the third PDF, finally,the third pdf link is https://drive.google.com/file/d/1LSbZpaVT9GrfotXplmKWl6HaCvxmaoH9/view?usp=sharing
My question is, is there a method which the input parameter is the first PDF and the output is the third PDF
If I understand you correctly, you start with an unsigned PDF document test1.pdf. You sign it using Adobe Acrobat and get a signed PDF document test2.pdf. Then you apply your code to that signed PDF and get a file test3.pdf.
And now you wonder whether you can get test3.pdf immediately from test1.pdf some other way, independent from the specific signing step done in Adobe Acrobat.
This is not possible in practice.
Signing a PDF does not merely append a few signature related attributes, it can completely re-organize the PDF internally!
For example, your original test1.pdf is a normally saved PDF with cross reference tables. Adobe Acrobat saved the signed document as a linearized PDF with object streams and cross reference streams. Also all the PDF objects are renumbered. This causes a byte-wise comparison of test1.pdf and test2.pdf to hardly find any similarities.
All these changes are not necessary for signing but merely represent Acrobat's preferred way of saving a hitherto unsigned PDF. Thus, after the next program update Acrobat may or may not change this behavior completely without prior notice.
But even if Acrobat only saved necessary changes (whenever it saves as an incremental update, it forgoes most unnecessary changes), there would still be multiple valid ways to format them.
Additionally there are multiple date and version information pieces. E.g. signing, creation, and modification time; also the signature in test2.pdf claims to have been created by Adobe Acrobat Pro DC version 2018.011.20038. A small change in the software used or in the timing of the use will create different information in the result file.
And as the output of your code, your third file, contains everything of test2.pdf except the embedded signature container, all the changes mentioned above are also in your third file.
Concerning the terms you use:
You call the output of the code you posted original content or original text (in your previous question here). This is a bit of a misnomer because that output does contain all the changes introduced by the signing program, in your example all the re-organization of the objects in the PDF by Adobe Acrobat, so it is not really original. This output merely are the signed bytes or signed byte ranges in the signed PDF.
Furthermore, you call that output a pdf. Strictly speaking it is not a PDF anymore, at least not a valid one. By removal of (the placeholder for) the signature container, the signature dictionary is broken and all offsets in the file after that missing value have shifted.

Buffer Not Large enough for pixel

I am trying to get a bitmap From byte array
val bitmap_tmp =
Bitmap.createBitmap(height, width, Bitmap.Config.ARGB_8888)
val buffer = ByteBuffer.wrap(decryptedText)
bitmap_tmp.copyPixelsFromBuffer(buffer)
callback.bitmap(bitmap_tmp)
I am facing a error in the below line :
bitmap_tmp.copyPixelsFromBuffer(buffer)
The Error Reads As:
java.lang.RuntimeException: Buffer not large enough for pixels
I have tried Different Solutions found on stack Like Add the line before error but still it crashes:
buffer.rewind()
However the Weird part is the same code at a different place for the same image [Same image with same dimensions] get perfectly functioned and I get the bitmap but here it crashes.
How do I solve this?
Thanks in Adv
The error message makes it sound like the buffer you're copying from isn't large enough, like it needs to contain at least as many bytes as necessary to overwrite every pixel in your bitmap (which has a set size and pixel config).
The documentation for the method doesn't make it clear, but here's the source for the Bitmap class, and in that method:
if (bufferBytes < bitmapBytes) {
throw new RuntimeException("Buffer not large enough for pixels");
}
So yeah, you can't partially overwrite the bitmap, you need enough data to fill it. And if you check the source, that depends on the buffer's current position and limit (it's not just its capacity, it's how much data is remaining to be read).
If it works elsewhere, I'm guessing decryptedText is different there, or maybe you're creating your Bitmap with a different Bitmap.Config (like ARGB_8888 requires 4 bytes per pixel)

iText's Alt-Text adding sample code not working for PDFs tagged using Acrobat

I'm working on a PDF accessibility assignment, which is to add alternative text in a tagged PDF. I got the sample code for the same at: Add alternative text for an image in tagged pdf (PDF/UA) using iText
Very much excited about that my task is going to end in a very short time, without much R&D.
Created a Java project based on the code, and when I executed it, it worked perfectly for the input PDF used in iText.
Unfortunately, the same source code is not working with PDFs tagged using Acrobat.
Sample Inputs: iText PDF: no_alt_attribute.pdf   &   My PDF: SARO_Sample_v1.7.pdf
Issue:
// This line works and returns RootElement
PdfDictionary structTreeRoot = catalog.getAsDict(PdfName.STRUCTTREEROOT);
// --> This line always returns NULL,
// Instead of returning the child elements of RootElement
PdfArray kids = structTreeRoot.getAsArray(PdfName.K);
// --> As per the structure Kids are present
Compared the structure of both PDFs and the following are my observations:
Tagging Structure - exactly same in both PDFs Tagging Structure
Content Structure - almost same, but a few additions are available in the PDF created by me. Content Structure
Tag Tree Structure - almost same respective to Tags, but with a major difference: iText's PDF tags are marked with /T:StructElem whereas that's not found in MY-PDF Even re-tagging doesn't help. Tag Tree Structure
Verified with various tagged PDFs available with us and all are similar (without /T:StructElem). These PDFs are validated and have passed accessibility compliance.
Need some thoughts on how to make this source code work with the PDFs we have. Alternatively, I need a way to ADD the missing /T:StructElem automatically in the PDFs while tagging in Acrobat.
Any help will be much appreciated!
Please do let me know if any further information is needed.
Note: I'm still not sure adding this /T:StructElem will work, since the PDFs were passed in PAC.
If this is really an issue, then those PDFs wont be passed the validations, right? But this is the only difference I found between those two PDFs.
PS: The Acrobat version I'm using is "Adobe Acrobat (Pro) DC."
-- Thanks,SaRaVaNaN
Bruno's code in the referenced answer does not walk the whole structure tree because he did not implement all cases of the K contents. The structure element K entry is specified like this:
The children of this structure element. The value of this entry may be one of the following objects or an array consisting of one or more of the following objects in any combination: [...]
(ISO 32000-2, Table 355 — Entries in a structure element dictionary)
Bruno's code, though, always assumes the value to be an array:
PdfArray kids = element.getAsArray(PdfName.K);
(Most likely he implemented that code with just the structure tree of the PDF in question there in mind.)
Thus, replace
PdfArray kids = element.getAsArray(PdfName.K);
if (kids == null) return;
for (int i = 0; i < kids.size(); i++)
manipulate(kids.getAsDict(i));
by something like
PdfObject kid = element.getDirectObject(PdfName.K);
if (kid instanceof PdfDictionary) {
manipulate((PdfDictionary)kid);
} else if (kid instanceof PdfArray) {
PdfArray kids = (PdfArray)kid;
for (int i = 0; i < kids.size(); i++)
manipulate(kids.getAsDict(i));
}
As you did not share an example document, I could not test the code. If there are problems, please share an example PDF.

How to add text to an existing pdf at a fixed position?

I must insert a number at a fix position in an existing A4 pdf.
I've tried the following as a first test, but that doesn't work(not text is added).
What goes wrong?
Here's my code:
byte[] omrMarks = omrFrame.getOmrImage();
Jpeg img = new Jpeg(omrMarks);
PdfImportedPage page = stamper.getImportedPage(source, pageNum);
PdfContentByte pageContent = stamper.getOverContent(pageNum);
pageContent.addImage(
img, img.getWidth(), 0, 0, img.getHeight(), 15f, (page.getHeight() - 312));
pageContent.moveTo(10, 200);
pageContent.beginText();
pageContent.setLiteral("Test");
pageContent.endText();
There are many issues with this question.
This is certainly wrong:
pageContent.moveTo(10, 200);
pageContent.beginText();
pageContent.setLiteral("Test");
pageContent.endText();
The moveTo() method doesn't make sense; it has no effect on the text state object.
The text state object is illegal because there's no setFontAndSize() (it's very odd that this doesn't throw a RuntimeException, are you using an obsolete version of iText?)
The setLiteral() method should only be used to add some literal PDF syntax to a content stream.
For instance, something like:
pageContent.setLiteral("\n100 100 m\n100 200 l\nS\n");
should only be used if you understand that the following PDF syntax draws a line:
100 100 m
100 200 l
S
It's clear from your question that you don't understand PDF syntax, so you shouldn't use these methods. Instead you should use convenience methods such as the showTextAligned() method, which hide the complexity of PDF and save you a couple of lines.
Maybe you have a good reason to opt for the "hard way", but in that case, you should read the documentation, otherwise you'll continue using methods such as setLiteral() instead of showText(), moveTo() instead of moveText(), and so on, resulting in code you don't want your employer to see.
Furthermore, you're making the assumption that the lower left corner of the page has the coordinates (0,0). That's probably true for the majority of PDF documents found in the wild, but that's not true for all PDF documents. The MediaBox doesn't have to be [0 0 595 842], it could as well be [595 842 1190 1684]. Moreover: what if there's a CropBox? Maybe you're adding content that isn't visible because it's cropped away...

Create pdf with tooltips in R

Simple question: Is there a way to plot a graph from R in a pdf file and include tooltips?
Simple question: Is there a way to plot a graph from R in a pdf file and include tooltips?
There's always a way. But the devil is in the details, so the real question is: are you willing to get your hands dirty?
PDF does support tooltips for certain kinds of objects such as hyperlinks. So, if there is a way to insert raw PDF statements indicating there should be a hyperlink-like object at some position in your plot, then there is a way to pop up a tooltip.
Now, the only way I know of to generate and compile raw PDF statements is to create the document using TeX. There are definitely other ways to do it, but this is the one I am familiar with. The following examples use a graphics device that Cameron Bracken and I wrote, the tikzDevice, to render R graphics to LaTeX code. The tikzDevice has preliminary support for injecting arbitrary LaTeX code into the graphics output stream through the tikzAnnotate function---we will be using this to drop PDF callouts into the plot.
The steps involved are:
Set up a LaTeX macro to generate the PDF commands required to produce a callout.
Set up an R function that uses tikzAnnotate to invoke the LaTeX macro at specific points in the plot.
???
Profit!
In the examples that follow, one major caveat is attached to step 2. The coordinate calculations used will only work with base graphics, not grid graphics such as ggplot2.
Simple Tooltips
Step 1
The tikzDevice allows you to create R graphics that include the execution of arbitrary LaTeX commands. Usually this is done to insert things like $\alpha$ into plot titles to generate greek letters, α, or complex equations. Here we are going to use this feature to invoke some raw PDF voodoo.
Any LaTeX macros that you wish to be available during the generation of a tikzDevice graphic need to be defined up-front by setting the tikzLatexPackages option. Here we are going to append some stuff to that declaration:
require(tikzDevice) # So that default options are set
options(tikzLatexPackages = c(
getOption('tikzLatexPackages'), # The original contents: required stuff
# Avert your eyes for a sec, all will be explained below
"\\def\\tooltiptarget{\\phantom{\\rule{1mm}{1mm}}}",
"\\newbox\\tempboxa\\setbox\\tempboxa=\\hbox{}\\immediate\\pdfxform\\tempboxa \\edef\\emptyicon{\\the\\pdflastxform}",
"\\newcommand\\tooltip[1]{\\pdfstartlink user{/Subtype /Text/Contents (#1)/AP <</N \\emptyicon\\space 0 R >>}\\tooltiptarget\\pdfendlink}"
))
If all that quoted nonsense were to be written out as LaTeX code by someone who cared about readability, it would look like this:
\def\tooltiptarget{\phantom{\rule{1mm}{1mm}}}
\newbox\tempboxa
\setbox\tempboxa=\hbox{}
\immediate\pdfxform\tempboxa
\edef\emptyicon{\the\pdflastxform}
\newcommand\tooltip[1]{%
\pdfstartlink user{%
/Subtype /Text
/Contents (#1)
/AP <<
/N \emptyicon\space 0 R
>>
}%
\tooltiptarget%
\pdfendlink%
}
For those programmers who have never taken a walk on the wild side and done some "programming" in TeX, here's a blow-by-blow for the above code (as I understand it anyway, TeX can get very weird---especially when you get down in the trenches with it):
Line 1: Define an object, tooltiptarget, which is non-visible (a phantom) and is a 1mm x 1mm rectangle (a rule). This will be the onscreen area which we will use to detect mouseovers.
Line 2: Create a new box, which is like a "page fragment" of typset material. Can contain pretty much anything, including other boxes (sort of like an R list). Call it tempboxa.
Line 3: Assign the contents of tempboxa to contain an empty box that arranges its contents using a horizontal layout (which is unimportant, could have used a vbox or other box).
Line 4: Create a PDF Form XObject using the contents of tempboxa. A Form XObject can be used by PDF files to store graphics, like logos, that may be used over and over. Here we are creating a "blank icon" that we can use later to cut down on visual clutter. TeX defers output operations, like writing objects to a PDF file, until certain conditions have been met---such as a page has filled up. Immediate makes sure this operation is not deferred.
Line 5: This line captures an integer value that serves as a reference to the PDF XObject we just created and assigns it the name emptyicon.
Line 6: Starts the definition of a new macro called tooltip that takes 1 argument which is referred to in the body of the macro as #1. Each line in the macro ends with a comment character, %, to keep TeX from noticing the newlines that have been added for readability (newlines can have strange effects inside macros).
Line 7: Output raw PDF commands (pdfstartlink). This begins the creation of a new PDF annotation object (\Type \Annot) of which there are about 20 different subtypes---among them are hyperlinks. Every line following this contains raw PDF markup with a couple of TeX macros.
Line 8: Declare the annotation subtype we are going to use. Here I am going with a plain Text annotation such as a comment or sticky note.
Line 9: Declare the contents of the annotation. This will be the contents of our tooltip and is set to #1, the argument to the tooltip macro.
Lines 10-12: Normally text annotations are marked by an icon, such as a sticky note, to highlight their location in the text. This behavior will cause a visual mess if we allow it to splatter little sticky notes all over our graphs. Here we use an appearance array (\AP << >>) set the "normal" annotation icon (\N) to be the blank icon we created earlier. The integer we stored in emptyicon along with 0 R forms a reference to the Form XObject we made on Line 4 using an empty box.
Line 14: If we were making a normal hyperlink, here is where the text/image/whatever would go that would serve as the link body. Instead we insert tooltiptarget, our invisible phantom object which does not show up on the screen but does react to mouseovers.
Step 2
Allright, now that we have told LaTeX how to create tooltips, it is time to make them usable from R. This involves writing a function that will take coordinates on our graph, such as (1,1), and convert them into canvas or "device" coordinates. In the case of the tikzDevice the required measurement is "TeX points" (1/72.27 of an inch) from the absolute bottom left of the plotting area. Fortunately for base graphics, there are handy functions to calculate this for us. Grid graphics work differently, so the approach taken in the examples here won't work for them.
The final task for our R function is to call tikzAnnotate to insert a TikZ "node" into the output stream that is located at the coordinates we computed. Nodes can contain arbitrary TeX commands, in this case we will be calling upon tooltip.
Here is an R function that contains the above functionality:
place_PDF_tooltip <- function(x, y, text){
# Calculate coordinates
tikzX <- round(grconvertX(x, to = "device"), 2)
tikzY <- round(grconvertY(y, to = "device"), 2)
# Insert node
tikzAnnotate(paste(
"\\node at (", tikzX, ",", tikzY, ") ",
"{\\tooltip{", text, "}};",
sep = ''
))
invisible()
}
Step 3
Try it out on a plot:
# standAlone creates a complete LaTeX document. Default output makes
# stripped-down graphs ment for inclusion in other LaTeX documents
tikz('tooltips_ahoy.tex', standAlone = TRUE)
plot(1,1)
place_PDF_tooltip(1,1, 'Hello, World!')
dev.off()
require(tools)
texi2dvi('tooltips_ahoy.tex', pdf = TRUE)
Step 4
Behold the result (download a pdf):
Advanced Tooltips
Step 1
So, now that we have simple tooltips out of the way, why not crank it to 11? In the previous example, we used an empty hbox to get rid of the tooltip icon. But what if we had put something in that box, like text or a drawing? And what if there was a way to make it so that the icon only appeared during mouseover events?
The following TeX macro is a little rough around the edges, but it shows that this is possible:
\usetikzlibrary{shapes.callouts}
\tikzset{tooltip/.style = {
rectangle callout,
draw,
callout absolute pointer = {(-2em, 1em)}
}}
\def\tooltiptarget{\phantom{\rule{1mm}{1mm}}}
\newbox\tempboxa
\newcommand\tooltip[1]{%
\def\tooltipcallout{\tikz{\node[tooltip]{#1};}}%
\setbox\tempboxa=\hbox{\phantom{\tooltipcallout}}%
\immediate\pdfxform\tempboxa%
\edef\emptyicon{\the\pdflastxform}%
\setbox\tempboxa=\hbox{\tooltipcallout}%
\immediate\pdfxform\tempboxa%
\edef\tooltipicon{\the\pdflastxform}%
\pdfstartlink user{%
/Subtype /Text
/Contents (#1)
/AP <<
/N \emptyicon\space 0 R
/R \tooltipicon\space 0 R
>>
}%
\tooltiptarget%
\pdfendlink%
}
The following modifications have been made compared to the simple callout.
The shapes.callouts library is loaded which contains templates for TikZ to use when drawing callout boxes.
A tooltip style is defined which contains some TikZ graphics boilerplate. It specifies a rectangular callout box that is to be visible (draw). The callout absolute pointer business is a hack because I've had too many beers by this point to figure out how to place annotation icons using dynamically generated PDF primitives. This relies on the default anchoring of icons at their upper left corner and so pulls the pointer of the callout box toward that location. The result is that the boxes will always appear to the lower right of the pointer and if the callout text is long enough, they won't look right.
Inside the macro, the tooltip is generated using a one-shot tikz command that is stuffed into the tooltipcallout macro. A form XObject is generated from tooltipcallout and assigned to tooltipicon.
emptyicon is also dynamically generated by evaluating tooltipcallout inside of phantom. This is required because the size of the default icon apparently sets the viewport available for the rollover icon.
When generating PDF commands, a new row is added to the /AP array, /R for rollover, that uses the XObject referenced by tooltipicon.
The ready to consume R version is:
require(tikzDevice)
options(tikzLatexPackages = c(
getOption('tikzLatexPackages'),
"\\usetikzlibrary{shapes.callouts}",
"\\tikzset{tooltip/.style = {rectangle callout,draw,callout absolute pointer = {(-2em, 1em)}}}",
"\\def\\tooltiptarget{\\phantom{\\rule{1mm}{1mm}}}",
"\\newbox\\tempboxa",
"\\newcommand\\tooltip[1]{\\def\\tooltipcallout{\\tikz{\\node[tooltip]{#1};}}\\setbox\\tempboxa=\\hbox{\\phantom{\\tooltipcallout}}\\immediate\\pdfxform\\tempboxa\\edef\\emptyicon{\\the\\pdflastxform}\\setbox\\tempboxa=\\hbox{\\tooltipcallout}\\immediate\\pdfxform\\tempboxa\\edef\\tooltipicon{\\the\\pdflastxform}\\pdfstartlink user{/Subtype /Text/Contents (#1)/AP <</N \\emptyicon\\space 0 R/R \\tooltipicon\\space 0 R>>}\\tooltiptarget\\pdfendlink}"
))
Step 2
The R-level code is unchanged.
Step 3
Let's try a slightly more complicated graph:
tikz('tooltips_with_callouts.tex', standAlone = TRUE)
x <- 1:10
y <- runif(10, 0, 10)
plot(x,y)
place_PDF_tooltip(x,y,x)
dev.off()
require(tools)
texi2dvi('tooltips_with_callouts.tex', pdf = TRUE)
Step 4
The result (download a PDF):
As you can see, there is an issue with both the tooltip and the callout being displayed. Setting \Contents () so that the tooltip has an empty string won't help anything. This can probably be solved by using a different annotation type, but I'm not going to spend any more time on it at the moment.
Caveats
Lots of TeX commands contain backslashes, you will need to double the backslashes when expressing things in R code.
Certain characters are special to TeX, such as _, ^, %, complete list here, you will need to ensure these are properly escaped when using the tikzDevice.
Even though PDF is supposed to be superior to HTML in that it has a consistant rendering across platforms, your mileage will vary significantly depending on which viewer is being used. The screenshots were taken in Acrobat 8 on OS X, Preview also did a passable job but did not render the rollover callout. On Linux, xpdf didn't render anything and okular showed a tooltip, but did not suppress the tooltip icon and displayed a stock icon that looked a little garish in the middle of a plot.
Alternative Implementations
cooltooltips and fancytooltips are LaTeX packages that provide tooltip functionality that could probably be used from the tikzDevice. Some ideas used in the above examples were taken from cooltooltips.
Concluding Thoughts
Was this worth it? Well, by the end of the day I did learn two things:
I am not a PDF expert and I had to search a couple mailing lists and digest some parts of a 700+ page specification to even answer the question "is this possible?" let alone come up with an implementation.
I am not a SVG expert and yet I know that tooltips in SVG is not a question of "is this possible?" but rather "how sexy do you want it to look?".
Given that observation, I say that it is getting to be time for PDF to ride off into the sunset and take it's 700 page spec with it. We need a new page description markup language and personally I'm hoping SVG Print will be complete enough to do the job. I would even accept Adobe Mars if the specification is accesible enough. Just give us an XML-based format that can exploit the full power of today's JavaScript libraries and then some real innovation can begin. A LuaTeX backend would be a nice bonus.
References
tikzDevice: A device for outputting R graphics as LaTeX code.
comp.text.tex: Source of much insight into esoteric TeX details. Posts by Herbert Voß and Mycroft provided implementation ideas.
The pdfTeX manual: Source of information concerning TeX commands that can generate raw PDF code, such as \pdfstartlink
TeX by Topic: Go-to manual for low-level TeX details such as how \immediate actually works.
The Adobe PDF Specification: The lair of details concerning PDF primitives.
The TikZ Manual: Quite possibly the finest example of open source documentation ever written.
Well, not pdf, but you could include tooltips (among ohers) in svg format with SVGAnnotation or RSVGTipsDevice package.
The pdf2 package works fine for me. (https://r-forge.r-project.org/R/?group_id=193). You can just include a 'popup' in the regular text command. It's not current as of 2.14, but I imagine he'll get round to it before too long.
text(x,y,'hard copy text',popup='tooltip text')