I will make the problem concrete. I currently have three PDFs
The first PDF is a pure PDF without any signature. The link is as follows,
https://drive.google.com/file/d/14gPZaL2AClRlPb5R2FQob4BBw31vvqYk/view?usp=sharing
The second PDF, I digitally signed the first PDF using adobe_acrobat_dc, the link is here,
https://drive.google.com/file/d/1CSrWV7SKrWUAJAf2uhwRZ8ephGa_uYYs/view?usp=sharing,
The third PDF is generated like this, I used the code you once provided as below
com.itextpdf.kernel.pdf.PdfReader pdfReader = new com.itextpdf.kernel.pdf.PdfReader(new
FileInputStream("C:\\Users\\Dell\\Desktop\\test2.pdf"));
com.itextpdf.kernel.pdf.PdfDocument pdfDocument = new com.itextpdf.kernel.pdf.PdfDocument(pdfReader);
SignatureUtil signatureUtil = new SignatureUtil((pdfDocument));
for(String name: signatureUtil.getSignatureNames()){
System.out.println(name);
PdfSignature signature = signatureUtil.getSignature(name);
PdfArray b = signature.getByteRange();
long[] longs = b.asLongArray();
RandomAccessFileOrArray rf = pdfReader.getSafeFile();
try (InputStream rg = new RASInputStream(new RandomAccessSourceFactory().createRanged(rf.createSourceView(),longs));
ByteArrayOutputStream byteArrayOutputStream = new com.itextpdf.io.source.ByteArrayOutputStream();) {
byte[] buf = new byte[8192];
int rd;
while ((rd = rg.read(buf, 0, buf.length)) > 0) {
byteArrayOutputStream.write(buf, 0, rd);
}
byte[] bytes1 = byteArrayOutputStream.toByteArray();
String s2 = DatatypeConverter.printBase64Binary(bytes1);
}
}
Process the second PDF to get the base64 encoded form of the third PDF, finally,the third pdf link is https://drive.google.com/file/d/1LSbZpaVT9GrfotXplmKWl6HaCvxmaoH9/view?usp=sharing
My question is, is there a method which the input parameter is the first PDF and the output is the third PDF
If I understand you correctly, you start with an unsigned PDF document test1.pdf. You sign it using Adobe Acrobat and get a signed PDF document test2.pdf. Then you apply your code to that signed PDF and get a file test3.pdf.
And now you wonder whether you can get test3.pdf immediately from test1.pdf some other way, independent from the specific signing step done in Adobe Acrobat.
This is not possible in practice.
Signing a PDF does not merely append a few signature related attributes, it can completely re-organize the PDF internally!
For example, your original test1.pdf is a normally saved PDF with cross reference tables. Adobe Acrobat saved the signed document as a linearized PDF with object streams and cross reference streams. Also all the PDF objects are renumbered. This causes a byte-wise comparison of test1.pdf and test2.pdf to hardly find any similarities.
All these changes are not necessary for signing but merely represent Acrobat's preferred way of saving a hitherto unsigned PDF. Thus, after the next program update Acrobat may or may not change this behavior completely without prior notice.
But even if Acrobat only saved necessary changes (whenever it saves as an incremental update, it forgoes most unnecessary changes), there would still be multiple valid ways to format them.
Additionally there are multiple date and version information pieces. E.g. signing, creation, and modification time; also the signature in test2.pdf claims to have been created by Adobe Acrobat Pro DC version 2018.011.20038. A small change in the software used or in the timing of the use will create different information in the result file.
And as the output of your code, your third file, contains everything of test2.pdf except the embedded signature container, all the changes mentioned above are also in your third file.
Concerning the terms you use:
You call the output of the code you posted original content or original text (in your previous question here). This is a bit of a misnomer because that output does contain all the changes introduced by the signing program, in your example all the re-organization of the objects in the PDF by Adobe Acrobat, so it is not really original. This output merely are the signed bytes or signed byte ranges in the signed PDF.
Furthermore, you call that output a pdf. Strictly speaking it is not a PDF anymore, at least not a valid one. By removal of (the placeholder for) the signature container, the signature dictionary is broken and all offsets in the file after that missing value have shifted.
Related
I'm working on a PDF accessibility assignment, which is to add alternative text in a tagged PDF. I got the sample code for the same at: Add alternative text for an image in tagged pdf (PDF/UA) using iText
Very much excited about that my task is going to end in a very short time, without much R&D.
Created a Java project based on the code, and when I executed it, it worked perfectly for the input PDF used in iText.
Unfortunately, the same source code is not working with PDFs tagged using Acrobat.
Sample Inputs: iText PDF: no_alt_attribute.pdf & My PDF: SARO_Sample_v1.7.pdf
Issue:
// This line works and returns RootElement
PdfDictionary structTreeRoot = catalog.getAsDict(PdfName.STRUCTTREEROOT);
// --> This line always returns NULL,
// Instead of returning the child elements of RootElement
PdfArray kids = structTreeRoot.getAsArray(PdfName.K);
// --> As per the structure Kids are present
Compared the structure of both PDFs and the following are my observations:
Tagging Structure - exactly same in both PDFs Tagging Structure
Content Structure - almost same, but a few additions are available in the PDF created by me. Content Structure
Tag Tree Structure - almost same respective to Tags, but with a major difference: iText's PDF tags are marked with /T:StructElem whereas that's not found in MY-PDF Even re-tagging doesn't help. Tag Tree Structure
Verified with various tagged PDFs available with us and all are similar (without /T:StructElem). These PDFs are validated and have passed accessibility compliance.
Need some thoughts on how to make this source code work with the PDFs we have. Alternatively, I need a way to ADD the missing /T:StructElem automatically in the PDFs while tagging in Acrobat.
Any help will be much appreciated!
Please do let me know if any further information is needed.
Note: I'm still not sure adding this /T:StructElem will work, since the PDFs were passed in PAC.
If this is really an issue, then those PDFs wont be passed the validations, right? But this is the only difference I found between those two PDFs.
PS: The Acrobat version I'm using is "Adobe Acrobat (Pro) DC."
-- Thanks,SaRaVaNaN
Bruno's code in the referenced answer does not walk the whole structure tree because he did not implement all cases of the K contents. The structure element K entry is specified like this:
The children of this structure element. The value of this entry may be one of the following objects or an array consisting of one or more of the following objects in any combination: [...]
(ISO 32000-2, Table 355 — Entries in a structure element dictionary)
Bruno's code, though, always assumes the value to be an array:
PdfArray kids = element.getAsArray(PdfName.K);
(Most likely he implemented that code with just the structure tree of the PDF in question there in mind.)
Thus, replace
PdfArray kids = element.getAsArray(PdfName.K);
if (kids == null) return;
for (int i = 0; i < kids.size(); i++)
manipulate(kids.getAsDict(i));
by something like
PdfObject kid = element.getDirectObject(PdfName.K);
if (kid instanceof PdfDictionary) {
manipulate((PdfDictionary)kid);
} else if (kid instanceof PdfArray) {
PdfArray kids = (PdfArray)kid;
for (int i = 0; i < kids.size(); i++)
manipulate(kids.getAsDict(i));
}
As you did not share an example document, I could not test the code. If there are problems, please share an example PDF.
I am using iText v5.5.1 to read PDF and render paint text from it:
pdfReader = new PdfReader(new CloseShieldInputStream(is));
pdfParser = new PdfReaderContentParser(pdfReader);
int maxPageNumber = pdfReader.getNumberOfPages();
int pageNumber = 1;
StringBuilder sb = new StringBuilder();
SimpleTextExtractionStrategy extractionStrategy = new SimpleTextExtractionStrategy();
while (pageNumber <= maxPageNumber) {
pdfParser.processContent(pageNumber, extractionStrategy);
sb.append(extractionStrategy.getText());
pageNumber++;
}
On one PDF file the following exception is thrown:
java.lang.ClassCastException: com.itextpdf.text.pdf.PdfNumber cannot be cast to com.itextpdf.text.pdf.PdfLiteral
at com.itextpdf.text.pdf.parser.PdfContentStreamProcessor.processContent(PdfContentStreamProcessor.java:382)
at com.itextpdf.text.pdf.parser.PdfReaderContentParser.processContent(PdfReaderContentParser.java:80)
That PDF file seems to be broken, but maybe its contents still makes sense...
Indeed
That PDF file seems to be broken
The content streams of all pages look like this:
/GS1 gs
q
595.00 0 0
It looks like they all are cut off early as the last line is not a complete operation. This certainly can make a parser hickup as iText does.
Furthermore the content should be longer because even the size of their compressed stream is a bit larger than the length of this. This indicates streams broken on the byte level.
Looking at the bytes of the PDF file one cannot help but notice that
even inside binary streams the codes 13 and 10 only occur together and
cross-reference offset values are less than the actual positions.
So I assume that this PDF has been transmitted using a transport method handling it as textual data, especially replacing any kind of assumed line break (CR or LF or CR LF) with the CR LF now omnipresent in the file (CR = Carriage Return = 13; LF = Line Feed = 10). Such replacements will automatically break any compressed data stream like the content streams in your file.
Unfortunately, though...
but maybe its contents still makes sense
Not much. There is one big image associated to each page respectively. Considering the small size of the content streams and the large image size I would assume that the PDF only contains scanned pages. But the images also are broken due to the replacements mentioned above.
This isn't the best solution, but I had this exact problem and unfortunately can't share the exact PDFs I was having issues with.
I made a fork of itextpdf that catches the ClassCastException and just skips PdfObjects that it takes issue with. It prints to System.out what the text contained and what type itextpdf thinks it was. I haven't been able to map this out to some systemic problem with my PDFs (someone smarter than me will need to do that), and this exception only happens once in a blue moon. Anyway, in case it helps anyone, this fork at least doesn't crash your code, lets you parse the majority of your PDFs, and gives you a bit of info on what types of bytestrings seem to give itextpdf indigestion.
https://github.com/njhwang/itextpdf
I'm very confused, why iTextsharp can't read or get the image from pdf(pdf converted from msword,excel,powerpoint)
Here's what I did, I opened msword file, then convert the msword file to pdf, then read the pdf file using iTextsharp, it doesn't recognize if pdf file has an image or shape.
I also tried from powerpoint to pdf, then read the pdf file, it doesn't read the images also.
Here's the code:
Below the images....EDITED...
This is the image that can't be extracted:
This is the image that I test a while ago that is good, and I don't know why the other image can't be detected, or it errors.
As of now I change the code to this:
But also can't detect its an image on the circle shape.
For pn As Integer = 1 To pc
Dim pg As PdfDictionary = pdfr.GetPageN(pn)
Dim res As PdfDictionary = DirectCast(PdfReader.GetPdfObject(pg.Get(PdfName.RESOURCES)), PdfDictionary)
Dim xobj As PdfDictionary = DirectCast(PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT)), PdfDictionary)
MessageBox.Show("THE ERROR IS HERE, IT BYPASS, SO XOBJ IS NOTHING IN THAT IMAGE")
If xobj IsNot Nothing Then
For Each name As PdfName In xobj.Keys
Dim obj As PdfObject = xobj.Get(name)
If obj.IsIndirect() Then
Dim tg As PdfDictionary = DirectCast(PdfReader.GetPdfObject(obj), PdfDictionary)
Dim type As PdfName = DirectCast(PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE)), PdfName)
Dim XrefIndex As Integer = Convert.ToInt32(DirectCast(obj, PRIndirectReference).Number.ToString(System.Globalization.CultureInfo.InvariantCulture))
Dim pdfObj As PdfObject = pdfr.GetPdfObject(XrefIndex)
Dim pdfStrem As PdfStream = DirectCast(pdfObj, PdfStream)
If PdfName.IMAGE.Equals(type) Then
Dim bytes As Byte() = PdfReader.GetStreamBytesRaw(DirectCast(pdfStrem, PRStream))
If (bytes IsNot Nothing) Then
Dim strat As New ImageInfoTextExtractionStrategy()
iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(pdfr, pn, strat)
End If
End If
End If
Next
End If
Next
Why your current code does not find or extract those shapes:
The smiley image and the flower image are completely different in nature: The flower image is a bitmap image stored in the PDF as an /XObject (eXternal Object) of subtype /Image while the smiley is a vector images stored in the PDF as a part of the page content stream as a (not necessarily continuous) sequence of path definition and drawing operations.
Your code only searches for bitmap images stored as external objects and it does so in a somewhat convoluted way: It first scans for image xobjects using lowlevel methods, and only if it finds such a xobject, it employs the iText highlevel extraction capabilities. If it started out using only the iText image extraction capabilities, it would be less complex, and at the same time it would also recognize inlined bitmap images.
You might want to look at the iText in Action — 2nd Edition chapter 15 Webified iTextSharp Examples, especially ExtractImages.cs which utilizes MyImageRenderListener.cs for this. While inspiration by that code could improve your current code, it won't yet help you with your issue at hand, though.
What you have to do to find or extract the shapes using iText:
Unfortunately your question is not entirely clear on what you actually are trying to achieve.
Do you merely need to detect whether there is some image (bitmap or vector graphic) on some page?
Do you need some information on the image, e.g. size or position on the page?
Or do you actually want to extract the image?
While these objectives can easily be implemented for bitmap graphics using the afore mentioned iText highlevel extraction capabilities, they are fairly difficult to fulfill for vector graphics.
For generic PDFs they are virtually impossible to implement because the drawing operations for an individual figure need not be together, and even worse, the drawing operations for different figures on the same page, for underlines on the page, and for other graphic effects might even be mixed in one big heap of operations in seemingly random order.
In your case, though, you have one advantage: Office seems to properly tag the figures in the PDF. This at least makes detection of the number of different (i.e. differently tagged) vector graphics on a page easy and also allows for the differentiation which drawing operation belongs to which figure.
Thus, here some pointers on how to achieve the goals mentioned above for PDFs tagged like your sample PDF. As I'm not using VB myself, I don't have sample code. But as your sample code shows that you already know how to follow object references and how to interpret PDF object information, these pointers should suffice to show the way.
1. Detecting whether there is some image on some page.
As the page content is tagged, it suffices to scan the structure hierarchy starting from the /StructTreeRoot entry in the document catalogue (use PdfReader.Catalog, select the value of PdfName.STRUCTTREEROOT in it, and dig into it).
E.g. for page 1 (in PDF object 4 0) of your sample (with the "1233" at the top and the smiley below) you'll find an array with dictionaries:
<<
/Pg 4 0 R
/K [0]
/S /P
/P 24 0 R
>>
and
<<
/Pg 4 0 R
/K [1]
/Alt ()
/S /Figure
/P 22 0 R
>>
each of which references the page (/Pg 4 0 R). The first one is of type /P, a paragraph (your "1233"), and the second one is of type /Figure, a figure (your smiley). The presence of the second element indicates the presence of a figure on the page. Thus, the goal 1 is achieved already with these data.
(For details cf. the PDF specification ISO 32000-1:2008 section 14.7 and 14.8.)
2. Retrieving some information on the image, e.g. size or position on the page.
For this you have to extract the graphics operators responsible for creating the figure in question. As it is tagged, you have to extract the operators in a marked content block associated with the marked content ID given by /K [1] in the /Figure dictionary above, i.e. 1.
In the content stream you'll find this:
/P <</MCID 1>> BDC 0.31 0.506 0.741 rg
108.6 516.6 m
108.6 569.29 160.18 612 223.8 612 c
287.42 612 339 569.29 339 516.6 c
339 463.91 287.42 421.2 223.8 421.2 c
160.18 421.2 108.6 463.91 108.6 516.6 c
h
f*
[...]
108.6 516.6 m
108.6 569.29 160.18 612 223.8 612 c
287.42 612 339 569.29 339 516.6 c
339 463.91 287.42 421.2 223.8 421.2 c
160.18 421.2 108.6 463.91 108.6 516.6 c
h
S
EMC
This section between BDC for /MCID 1 and EMC contains the graphics operations you seek. If you want to get some information on the figure they represent, you have to analyze them.
This is a very low-level view on all this and one might whish for a higher level API to retrieve this.
iText does have a high level API for the analogous operations for text and bitmap image processing using the parser namespace class PdfReaderContentParser together with some apropos RenderListener implementation like your ImageInfoTextExtractionStrategy. Unfortunately, though, PdfReaderContentParser does not yet properly pre-process the vector graphics related operators.
To do this with iText, therefore, you either have to extend the underlying PdfContentStreamProcessor to add the missing pre-processing (which is do-able as that parser class is implemented using separate listeners for the individual operations, and you can easily register new listeners for the graphics operators); or you have to retrieve the page content and parse it yourself.
3. Extracting the image.
As the vector images inside a PDF use PDF specific vector graphics operators, you first have to decide in which format you want to export the image. Unless you are interested in the raw PDF operators, you will most likely require some library helping you to create a file in the desired format.
Once that is decided, you first extract the graphics operators in question as explained before and then feed them to that library to create an exportable image of your choice.
I am trying to extract all the images in a PDF and then convert them into DIB format. First part is easy. I extract all the contents in the PDF, then iterate through them and whenever I find a PDEImage, I put them in an array.
But I am clueless about how to go about the second part. Looks like all the AVConversion methods allow you to convert a whole page of a PDF, not just images, into other formats.
Is there any way I can accomplish this task? Thanks in advance!
EDIT: Further elaborating the problem.
I am writing an Adobe Acrobat Plug-in using Visual C++ with .NET Framework 4.
The purpose of the plug-in is to (among other things) extract image data from a PDF file, then convert those data to DIBs. The need to convert to DISs is because I then pass those DIBs to another library which do some image correction work on them.
Now my problem is with converting the said image data in PDFs to DIBs. The image data on PDFs are found in a format called PDEImage (Ref Link) where apparently it contains all the color data of the image. Now I'm using the following code to extract the said image data bits from the image to be used with CreateCompatibleBitmap() and SetBitmapBits() to obtain a HBITMAP handle. Then, I pass that along with other necessary parameters to the GetDIBits() to obtain a DIB in the form of a byte array as stated in the MSDN.
void GetDIBImage(PDEElement element)
{
//Obtaining a PDEImage
PDEImage image;
memset(&image, 0, sizeof(PDEImage));
image = (PDEImage)element;
//Obtaining attributes (such as width, height)
//of the image for later use
PDEImageAttrs attrs;
memset(&attrs, 0, sizeof(attrs));
PDEImageGetAttrs(image, &attrs, sizeof(attrs));
//Obtainig image data from PDEImage to a byte array
ASInt32 len = PDEImageGetDataLen(image);
byte *data = (byte *)malloc(len);
PDEImageGetData(image, 0, data);
//Creating a DDB using said data
HDC hdc = CreateCompatibleDC(NULL);
HBITMAP hBmp = CreateCompatibleBitmap(hdc, attrs.width, attrs.height);
LONG bitsSet = SetBitmapBits(hBmp, len, data); //Here bitsSet gets a value of 59000 which is close to the image's actual size
//Buffer which GetDIBits() will fill with DIB data
unsigned char* buff = new unsigned char[len];
//BITMAPINFO stucture to be passed to GetDIBits()
BITMAPINFO bmpInfo;
memset(&bmpInfo, 0, sizeof(bmpInfo));
bmpInfo.bmiHeader.biSize = sizeof(BITMAPINFOHEADER);
bmpInfo.bmiHeader.biWidth = (LONG)attrs.width;
bmpInfo.bmiHeader.biHeight = (LONG)attrs.height;
bmpInfo.bmiHeader.biPlanes = 1;
bmpInfo.bmiHeader.biBitCount = 8;
bmpInfo.bmiHeader.biCompression = BI_RGB;
bmpInfo.bmiHeader.biSizeImage = ((((bmpInfo.bmiHeader.biWidth * bmpInfo.bmiHeader.biBitCount) + 31) & ~31) >> 3) * bmpInfo.bmiHeader.biHeight;
bmpInfo.bmiHeader.biXPelsPerMeter = 0;
bmpInfo.bmiHeader.biYPelsPerMeter = 0;
bmpInfo.bmiHeader.biClrUsed = 0;
bmpInfo.bmiHeader.biClrImportant = 0;
//Callling GetDIBits()
//Here scanLines get a value of 0, while buff receives no data.
int scanLines = GetDIBits(hdc, hBmp, 0, attrs.height, &buff, &bmpInfo, DIB_RGB_COLORS);
if(scanLines > 0)
{
MessageBox(NULL, L"SUCCESS", L"Message", MB_OK);
}
else
{
MessageBox(NULL, L"FAIL", L"Message", MB_OK);
}
}
Here are my questions / concerns.
Is it correct the way I'm using CreateCompatibleDC(), CreateCompatibleBitmap() and SetBitmapBits() functions? My thinking is that I use CreateCompatibleDC() to obtain current DC, then create a DDB using CreateCompatibleBitmap() and then set the actual data to the DDB using SetBitmapBits(). Is that correct?
Is there a problem with the way I've created the BITMAPINFO structure. I am under the assumption that it need to contain all the details regarding the format of the DIB I will eventually obtain.
Why am I not getting the bitmap data as a DIB to the buff when I call GetDIBits()?
I do not know the library you use to access the inner structures of a PDF file but the problem at hand will have tree distinct subproblems:
Find all images in the PDF file
Decode the images to their components
Convert the decoded image to a DIB
Find all Images
Images can occur inside content streams or in streams attached to dictionaries. To find all images in content streams, you need to find all content streams in either Pages, XObjects or Patterns. Each of those can have a Resources -> XObject dictionary that references all XObjects (and an XObject can be an Image).
If you avoid the inline images you might simply scan the PDF file and each dectionary that is of type XObject subtype Image can be decoded.
Decode
All streams (inline in content streams) of in separate objects in the PDF file are encoded and mught need post processing using the Decode arrays. There are several filters that you need to be able to perform for decoding. Flate decode (ZLIB), JPEG and CCITT (fax G3/G4) are probable the most used for images. Hopefully the PDF library you use will know how to decode the streams..
Next there are Decode arrays (a bit rare) where each color component can be scaled from an input value to an output value. This is a linear transformation.
To DIB
Next in line is the conversion of the decoded image to a DIB. This means you need to convert the color components to something Windows can 'get' (eg, Palette, grayscale (special palette) of RGB. PDF supports a very very large variety of color spaces and converting them to RGB is no sinecure. You best hope here is that the PDFs you need to process only use a select subset (like RGB and palette). Now a DIB can be simply created by creating the bitmap header (BITMAPINFO), fill in all data and call the DIB creation function CreateDIBSection and them process the DIB the way you application needs.
Epilogue
All in all: to be able to process all PDF files and find all images is quite a daunting task, if you control the source if teh PDFs and you know they are always in DeviceRGB format and always JPEG etc and never inlined into the content stream it is do-able.
I'm creating a pdf file using BIRT reporting library. Later I need to digitally sign these files. I'm using iText to digitally sign the document.
The issue I'm facing is, I need to place the signature in different places in different reports. I already have the code to digitally sign the document, now I'm always placing the signature at the bottom of last page in every report.
Eventually I need each report to say where I need to place the signature. Then I've to read the location using iText and then place the signature at that location.
Is this possible to achieve using BIRT and iText
Thanks
If you're willing to cheat a bit, you can use a link... which BIRT supports according to my little dive into their docs just now.
A link is an annotation. Sadly, iText doesn't support examining annotations at a high level, only generating them, so you'll have to use the low-level object calls.
The code to extract it might look something like this:
// getPageN is looking for a page number, not a page index
PdfDictionary lastPageDict = myReader.getPageN(myReader.getNumberOfPages());
PdfArray annotations = lastPageDict.getAsArray( PdfName.ANNOTS );
PdfArray linkRect = null;
if (annotations != null) {
int numAnnots = annotations.size();
for (int i = 0; i < numAnnots; ++i) {
PdfDictionary annotDict = annotations.getAsDict( i );
if (annotDict == null)
continue; // it'll never happen, unless you're dealing with a Really Messed Up PDF.
if (PdfName.LINK.equals( annotDict.getAsName( PdfName.SUBTYPE ) )) {
// if this isn't the only link on the last page, you'll have to check the URL, which
// is a tad more work.
linkRect = annotDict.getAsArray( PdfName.RECT );
// a little sanity check here wouldn't hurt, but I have yet to come across a PDF
// that was THAT screwed up, and I've seen some Really Messed Up PDFs over the years.
// and kill the link, it's just there for a placeholder anyway.
// iText doesn't maintain any extra info on links, so no need for other calls.
annotations.remove( i );
break;
}
}
}
if (linkRect != null) {
// linkRect is an array, thusly: [ llx, lly, urx, ury ].
// you could use floats instead, but I wouldn't go with integers.
double llx = linkRect.getAsNumber( 0 ).getDoubleValue();
double lly = linkRect.getAsNumber( 1 ).getDoubleValue();
double urx = linkRect.getAsNumber( 2 ).getDoubleValue();
double ury = linkRect.getAsNumber( 3 ).getDoubleValue();
// make your signature
magic();
}
If BIRT generates some text in the page contents under the link for its visual representation, that's only a minor issue. Your signature should cover it completely.
You're definitely better of if you can generate the signature directly from BIRT in the first place, but my little inspection of their docs didn't exactly fill me with confidence in their PDF customization abilities... despite sitting on top of iText themselves. It's a report generator that happens to be able to produce PDFs... I shouldn't expect too much.
`
Edit: If you need to look for the specific URL, you'll want to look at section "12.5.6.5 Link Annotations" of the PDF Reference, which can be found here:
http://www.adobe.com/content/dam/Adobe/en/devnet/pdf/pdfs/PDF32000_2008.pdf
I don't know anything about BIRT, and have only a little familiarity with iText. But maybe this works...
Can BIRT generate the signature box's outline as a regular form field with a given field name? If so, then you should be able to:
Lookup that field by name in iText's AcroFields hashmap, using getField;
Create a new signature using the pdf stamper, and set its geometry based on the values of the old field object; and
Delete the old field using removeField.