How to read the value of fields in signed PDF using PDF Box API

How to read the value of fields in signed PDF using PDF Box API - pdfbox

Once after completion of digital signature for PDF using DocuSign, How to read value of the fields in PDF using field ID/Name (using PDF Box API)? I am not able get the field ids of Digitally Signed PDF.

The sample PDF showed that the fields in PDF are not PDF form fields after all, neither Acrofornm nor XFA, they merely are texts with some lines around them. (They may once have been PDF form fields which were flattened, or they may never have been PDF form fields to start with.)
Thus, your only option left is text extraction. PDFBox has a quite elaborate text extraction engine. Have a look at PDFTextStripper. You can try and use this class as is, looking in the extracted string for the field labels and extract the following text until the end-of-line; or if you have the time, you can try to make use of the internal PDF structure where the field contents are in a separate Xobject.

Related

Is there an possibility to save a dynamic pdf as static but still have interactive fields within the document?

I have a dynamic pdf which I want to use for DocuSign and thus needs to be static. I cannot simply make use of the print as pdf function as I still want to use the interactive fields within the pdf.
I Tried to use Adobe AEM Forms Designer to save the document as static. But solely the first page of the form is saved.

Regarding your concern, I would like to share the following information, you can learn about PDF form field transformation. It enables you to transform PDF form fields automatically into DocuSign tabs, carrying over all of their existing values. The locations of the created tabs will match the locations of the fields from which they were generated.
To transform PDF form fields into DocuSign tabs, you need to set the transformPdfFields property on the documents whose fields you want to transform.
https://developers.docusign.com/docs/esign-rest-api/esign101/concepts/tabs/pdf-transform/
https://www.docusign.com/blog/developers/the-trenches-pdf-form-field-transformation
Best Regards,
Eric | DocuSign

PDF type recognition programmatically

I need to recognize if a PDF file is a dynamic form or a read only PDF in my code. No matter which programing language.
Is there any way to be able to detect if my target PDF is a form or not?

What you're looking for is the presence of an "AcroForm" dictionary in the document "Catalog" dictionary. The AcroForm dictionary will be present if the PDF has any kind of form field, signature field, or XFA fields. You'll still need some sort of PDF library tool to parse the PDF objects but there's one available in most languages at this point.
That said, several PDF viewers and online services do allow users to just type onto static PDF files that don't have interactive form fields. I'm not sure if you want to control for that case though.

PDF forms in Apache FOP generated PDF

I am using FOP to generate the PDF from XSL:FO and iText to apply the digital signature and signed date field in the PDF.
Problem is i am using co ordinates to apply the signature and date in the generated PDF.
The PDF are large, dynamic and changing always based on content. therefore the signature and date fields were not positioned Properly by the co-ordinates specified.
I googled to find solution for a long days. But i didnt get any solutions.
Can u please any one suggest me to create the PDF form fields in the PDF using FOP?
Then i can apply signature and date fields by using iText.
Please tell me any other technology to try to solve this problem?

The problem is that FOP doesn't create signature fields whereas iText needs a page number and coordinates (either defined by you or by a signature field).
Where do you want the signature to be placed?
Is it always on the first page? Always on the last page?
Can you put some unique text at the location where you want the document to be signed?
I'm asking this because you could put words like SIGN HERE on the last page, and then use TextRenderInfo to retrieve the coordinates of those words. See http://itextpdf.com/examples/iia.php?id=275 in combination with http://itextpdf.com/examples/iia.php?id=282
The TextRenderInfo class has methods such as getBaseLine(), getDescentLine(),... who give you LineSegment object which reveal the coordinates of each snippet of text in your PDF.
There are plenty of caveats: FOP could cut the words SIGN HERE in different snippets, such as "SIG", "N", "HE", "RE" which would make it difficult to recognize the unique string, but it's worth investigating.

batch edit a PDF by adding a barcode based on text in the PDF?

I receive around 30 workorders a day from my primary client. They send them to me in a standardized report format, in a single PDF, with one page for each different workorder. Unfortunately, these PDF reports dont include the workrorder_ID in a barcoded format, only in regular text font and they are unwilling to comply to my request to modify the report by adding a barcode. Is there a way to automatically add a barcode to the PDF? basically I would want the PDF editing app to search for the text “workoder ID:” and to insert the barcode, beneath the work_order ID.
please advise. thanks very much

You will need to use a PDF Library that includes a text extraction that includes reporting the location of each string extracted. When you find the location of the Work Order ID text you can then use the same library add the barcode in correct position. Quick PDF Library would be one option and iText be another.
http://www.quickpdflibrary.com
http://itextpdf.com
Disclaimer : I do some consulting for Quick PDF Library

How to change value of a textbox in a pdf

I have to make several certificates with the same design but different names. So I've tried to make an uncompressed pdf file with a place holder text and tried to change it with a text editor. For some reason it didn't work. I could only see a single letter of the replaced text.
When I try the same thing with an eps file, it works but since eps doesn't keep (AFAIK) page orientation, there is a chance that it something will be different with different names.
Does anyone know why this didn't work or how to change a text box in a pdf file (with sed)?
(I created the master pdf with Illustrator CS4)
Thank you

In general, editing PDFs in a text editor is a Bad Idea. PDFs depend on the byte offsets of various objects to not move.
If you KNOW your editor won't change the EOL bytes (or what it thinks are eol bytes), and you DO NOT change the length of the text entry's object as a whole, you're okay.
For example:
1 0 obj
<</Type/Annotation/Subtype/Widget/V(PlaceHolder Value)/T(Field Title)...>>
endobj
If your new value is longer than "placeholder value", you're screwed.
Most PDFs contain quite a bit of compressed binary data. Some of that data WILL be misinterpreted as EOL characters. Changing them will:
a: break your compressed stream
b: possibly change the byte offsets of the rest of the PDF.
When I hack on PDF files, I always use a hex editor.
Bottom Line: Don't mess with PDFs as a text stream. Mess with them as PDF files, using a PDF library. There's sure to be one capable of altering form field values in your language of choice.
You can also look into FDF and XFDF to see if they'll suit you better. Both file formats store field/value pairs and a reference to the form to use with those pairs. FDF uses PDF's syntax, while XFDF is an XML grammar. You can serve the [X]FDF to your end user and they will see the filled-in form.
WARNING: Unless the form is Reader Enabled (requires Acrobat (pro?)), they won't be able to save the version of the form they get after opening the [X]FDF, only view/print it. Of course they can save the [X]FDF, but many users might balk at this Strange New Format.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas