PDF forms in Apache FOP generated PDF

PDF forms in Apache FOP generated PDF - apache

I am using FOP to generate the PDF from XSL:FO and iText to apply the digital signature and signed date field in the PDF.
Problem is i am using co ordinates to apply the signature and date in the generated PDF.
The PDF are large, dynamic and changing always based on content. therefore the signature and date fields were not positioned Properly by the co-ordinates specified.
I googled to find solution for a long days. But i didnt get any solutions.
Can u please any one suggest me to create the PDF form fields in the PDF using FOP?
Then i can apply signature and date fields by using iText.
Please tell me any other technology to try to solve this problem?

The problem is that FOP doesn't create signature fields whereas iText needs a page number and coordinates (either defined by you or by a signature field).
Where do you want the signature to be placed?
Is it always on the first page? Always on the last page?
Can you put some unique text at the location where you want the document to be signed?
I'm asking this because you could put words like SIGN HERE on the last page, and then use TextRenderInfo to retrieve the coordinates of those words. See http://itextpdf.com/examples/iia.php?id=275 in combination with http://itextpdf.com/examples/iia.php?id=282
The TextRenderInfo class has methods such as getBaseLine(), getDescentLine(),... who give you LineSegment object which reveal the coordinates of each snippet of text in your PDF.
There are plenty of caveats: FOP could cut the words SIGN HERE in different snippets, such as "SIG", "N", "HE", "RE" which would make it difficult to recognize the unique string, but it's worth investigating.

Related

"Re-paginate" PDF using iText

Disclaimer:
I am using iText 5. I know this is generally frowned upon (vs. using iText 7), but I am working with considerable legacy code that uses iText 5, and upgrading does not fall under my control.
Requirements:
A "simple" PDF/A is received as input (text only, these are generated from RTF), as well as a float value corresponding to a desired first page length in inches.
A PDF/A must be output that is identical to the input PDF, except it is paginated as follows: first page length = input value; each subsequent (not first or last) page will fill a standard page length; the last page will be truncated a constant number of points below the content nearest the bottom of the page. Note that input and output width will be identical and constant.
Progress / Approach:
I have extended the SimpleTextExtractionStrategy to generate XML containing font information (size and family, bold or italics, etc.) as well as location information (relative an absolute coordinate system where the origin is at the top left corner of the first page of the input PDF) for each "span" of text extracted from the input PDF.
I then generate a new PDF page by page (where each page is the desired length according to the requirements outlined above), filtering the extracted XML info with LINQ based on the bounds of each new page, and adding appropriately formatted text at the appropriate location using ColumnText.ShowTextAligned(...).
Problem:
The approach outlined above does fine. It generates PDFs with the desired page structure, but some information is lost in translation, namely colored text and underlined text. While colored text shouldn't be seen in these PDFs, underlined text absolutely must be detected.
This set of requirements should also include PDFs with tables. I originally planned on implementing a different module that adheres to the same interface for table PDFs, as these are generated and used separately from the PDFs generated from RTF, and iText has relatively strong table functionality built in.
The two concerns outlined above, coupled with the fact that my described approach was born out of an attempt to reuse existing code leads me to believe that an entirely different approach may be necessary or at least much better. It seems to me that there should be a way to capture content byte info and clip it as necessary to "re-paginate" the input PDF, only worrying about moving content that falls along a page boundary.
Essentially, I am looking for (iText based) recommendations for a better approach. Pseudo-code type answers or simply recommendations for classes / interfaces that may help are acceptable. While it would be nice to handle text and tables together, any advice pertinent to one or the other would also be appreciated. I have perused much of the available documentation on the iText website and other SO questions, but have not found quite what I'm looking for.
Note that no code is included in this question as I am looking for a high-level approach that is entirely different from what I have tried.
Edit:
I didn't notice it before, but the way in which I was reusing fonts (similar to this) resulted in some unexpected (but documented as such) behavior. It seems that I will need to avoid extracting information for re pagination at the text level, as it will be difficult to ensure continuity of fonts between input and output.

I solved this problem a while ago, but figured I would post my solution. I'm sure it's not the most efficient solution, but it works well for my purposes. Note that this will re-paginate a PDF as described in the question containing text only. Table PDF's are handled separately.
The basic process is this:
Use a custom TextExtractionStrategy to extract XML containing information regarding ascent and descent lines for all text in the input PDF, as well as what page it originally appears on.
Given the page length requirements as described in the question (first page = input value, subsequent = standard length, last page = fit content) and the XML info regarding text positions, determine what content will fit on each page of the output PDF. Create a map of where each input page will need to be cropped (top and bottom, note that each input page may be cropped more than once), as well as a map of which cropped pages will need to be "concatenated" together in the final output.
Copy the input PDF page by page to an intermediate temporary PDF (using PdfCopier). If an input page must be cropped more than once (ex: first 2 inches of input page 1 = page 1 output, next 6 inches of input page 1 = page 2 output, final 0.5 inch of input page 1 = top of page 3 output), ensure that it is copied the appropriate number of times (1 time per crop).
Crop each page of the intermediate copied PDF appropriately. This is done by modifying the MediaBox and / or CropBox.
Concatenate the appropriate cropped pages together into the final output PDF's pages. I used a PdfWriter to first create a new page of the appropriate height, then add each appropriate cropped page at the appropriate position in the output PDF page's byte content usingcontentByte.AddTemplate(inputCroppedPage, 0, bottomOfLastAddedCroppedPage).
To anyone who managed to read and understand all of that, congratulations. To anyone else, please let me know what you if you are confused. The solution described above is a little twisted and tough to put into words. While there is too much code to post here (and I am not at liberty to share the code on GitHub or similar), I would be happy to answer any questions that will help someone else implement something similar.
The TextExtractionStrategy mentioned in step 1 was inspired by this answer. Essentially, I used System.Xml.Linq to create an XML document rather than concatentating strings to form HTML, and I ignored any font information, storing only information regarding where text is located in the page (you'll see that this information is available in the linked answer, just isn't written into the final HTML).

How to read the value of fields in signed PDF using PDF Box API

Once after completion of digital signature for PDF using DocuSign, How to read value of the fields in PDF using field ID/Name (using PDF Box API)? I am not able get the field ids of Digitally Signed PDF.

The sample PDF showed that the fields in PDF are not PDF form fields after all, neither Acrofornm nor XFA, they merely are texts with some lines around them. (They may once have been PDF form fields which were flattened, or they may never have been PDF form fields to start with.)
Thus, your only option left is text extraction. PDFBox has a quite elaborate text extraction engine. Have a look at PDFTextStripper. You can try and use this class as is, looking in the extracted string for the field labels and extract the following text until the end-of-line; or if you have the time, you can try to make use of the internal PDF structure where the field contents are in a separate Xobject.

iTextSharp - when extracting a page it fails to carry over Adobe rectangle highlighting important info

Per the following site...
http://forums.asp.net/t/1630140.aspx?extracting+pdf+pages+using+itextsharp
...I use the function ExtractPages to produce a new PDF based on range of page numbers. My problem is that I noticed a PDF that had a rectangle on the 2nd page was not extracted along with the page. This causes me some fear that perhaps Adobe comments are not being carried over as well as the pages get extracted.
Is there a way I can adjust this code to take into consideration to bring over comments and objects like rectangles to the new PDF when ExtractPages is called? Am I missing a syntax or is that not available with version 5.5.0 of iTextSharp?

Your use of the verb extract in the context of extracting pages is confusing. People will think you want to extract text from a page. In reality, you want to import or copy pages.
The example you refer to uses PdfWriter. That's wrong: you should use PdfStamper (if only one existing PDF is involved) or PdfCopy (if multiple existing PDFs are involved). See my answer to the question How to keep original rotate page in itextSharp (dll) to find out why the example on forums.asp.net is a really, really bad example.
The fact that a page has "a rectangle" is irrelevant. Maybe the rectangle is an annotation. In that case, you're throwing that rectangle away by using the wrong example. Maybe the origin of the page is different from 0,0...
If your purpose is to create a new PDF containing only a selection of pages of the original PDF, please read my answer to Function that I can use to remove a single page from a PDF using iText

batch edit a PDF by adding a barcode based on text in the PDF?

I receive around 30 workorders a day from my primary client. They send them to me in a standardized report format, in a single PDF, with one page for each different workorder. Unfortunately, these PDF reports dont include the workrorder_ID in a barcoded format, only in regular text font and they are unwilling to comply to my request to modify the report by adding a barcode. Is there a way to automatically add a barcode to the PDF? basically I would want the PDF editing app to search for the text “workoder ID:” and to insert the barcode, beneath the work_order ID.
please advise. thanks very much

You will need to use a PDF Library that includes a text extraction that includes reporting the location of each string extracted. When you find the location of the Work Order ID text you can then use the same library add the barcode in correct position. Quick PDF Library would be one option and iText be another.
http://www.quickpdflibrary.com
http://itextpdf.com
Disclaimer : I do some consulting for Quick PDF Library

Append PDF to a Signed PDF

I need to append a pdf file to a digital signed pdf file, keeping valid the signature ...maybe using revision? ...using iTextSharp? How can I do it?
Please help me with some sample.

You can use Increment Update to do that, as long as the original signature allows you.
Take look at the document:
http://learn.adobe.com/wiki/download/attachments/52658564/Acrobat_DigitalSignatures_in_PDF.pdf?version=1&modificationDate=1269905473000
on Page 8 you will find what you want.

You can't as that invalidates the whole point of digital signatures, namely to detect when something exactly as you describe occurs and therefore ensure the validity of the original document. To do as you want, you will need to add the extra PDF to the unsigned original PDF and then resign the new conglomerate PDF.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas