Cloning a form field with all styles using PDFBox

Cloning a form field with all styles using PDFBox - pdf

We need to fill out a form for our customers and get a PDF, which they can print, sign & send back. As the PDF has to change frequently, we use PDF templates with form fields, which are fill by our tool (pdftk). For various reasons I would like to switch to PDFBox (one is, that it required us to split the templates in individual pages and save them to disk, fill them and then merged them together again). So far everything works fine.
But I struggle with the page numbering. As the form is combined out of multiple templates, I have to fix the page number with PDFBox. So far, we used a styled input field page_num on every page. But since they all have the same name, I can't fill them individually.
Can I somehow split or clone the fields and give them individual names, so I can fill them individually? Of course, the styling should stay like it is.

With the help of the PDFBox guys I've got it to work. My solution is using JRuby, but I think you could pretty easily translate it to Java (remove the Java::OrgApachePdfbox... namespace).
doc = Java::OrgApachePdfboxPdmodel::PDDocument.load("input.pdf")
form = doc.getDocumentCatalog.getAcroForm
pages = doc.getDocumentCatalog.getAllPages.toArray.to_a
page_num = form.getField("page_num")
string = page_num.getDictionary
.getDictionaryObject(Java::OrgApachePdfboxCos::COSName::DA)
page_num.getKids.to_array.each do |widget|
widget_dict = widget.getDictionary
widget_dict.setString(Java::OrgApachePdfboxCos::COSName::DA, string.getString)
field = Java::OrgApachePdfboxPdmodelInteractiveForm::PDTextbox.new(
form,
widget_dict
)
field.setParent(page_num)
page = (pages.index(widget.getPage) + 1).to_s
field.setPartialName("page_num_#{page}")
field.setValue(page)
end
doc.save("output.pdf")

Related

Libre Office Labels don't show up as "AcroFields" in iTextSharp?

so I've been trying to generate a report. I've tried quite a few things already but there always seems to be problems. I'm currently trying iTextSharp 4.1.6.
My current strategy is to use LibreOffice to create a document with editable pdf fields, or I guess they are called "AcroFields". I'm not sure since I can't find a definition. But anyways, I assume that all of these are "AcroFields":
But if I put all of those into a form and export as pdf only some of them show up as AcroFields:
var reader = new PdfReader(File.ReadAllBytes("abc.pdf"));
foreach(var field in reader.AcroFields.Fields)
{
Console.WriteLine(((DictionaryEntry)field).Key);
}
> Text Box 1
Check Box 1
Numeric Field 1
Formatted Field 1
Date Field 1
List Box 1
Combo Box 1
Push Button 1
Option Button 1
Notice how Label Field 1 is not present. If it were present then doing a text replace might be easy. Except it's not present so it's looking like even iText can't do a simple text replace in a pdf. Is this true? How would you replace text in a pdf document using iTextSharp?

Notice how Label Field 1 is not present.
As there is no AcroForm form field type "label", form labels usually are drawn as regular page content in PDF files.
If it were present then doing a text replace might be easy. Except it's not present so it's looking like even iText can't do a simple text replace in a pdf. Is this true?
Indeed, in general there is no simple text replacement in a PDF.
How would you replace text in a pdf document using iTextSharp?
I would determine the bounding box coordinates of the text to replace using the iText text extraction feature with some extension that returns text plus coordinates. Then I'd remove that text by redaction using iText's PdfCleanUp... classes. Finally I'd add the replacement text as new text in the bounding box determined at start.
Unfortunately for you, both good text extraction and redaction are not present in your version 4.1.6; for this approach you should update at least to 5.5.x.
Alternatively, though, as you've been trying to generate a report, I assume the template design is in your hands. In that case you can put your labels into read-only text fields which you can change (they are read-only only to GUI users).

How do I fill out fillable PDF Form fields using 4gl?

I have a PDF form that I'm filling out with data using progress-4gl. To date, I've been only filling in text fields using the following syntax:
put stream stream1 unform
"^global CHX_SINGLE_CE_PLAN3" skip(0)
"X" skip
CHX_SINGLE_CE_PLAN3 is the field name...
This code works when dealing with text fields but I'm trying to check a box instead of fill in a text field. I cannot find any documentation on this. Is checking a box on a fillable pdf form even possible with 4gl?

As far as I remember PDF Include has support for filling fillable forms. Whilst it's probably a bit over the top in terms of what you want to achieve, it's an open source project and so you may well find the answer to your question within the code itself.
Here's a link to the project page: http://www.oehive.org/pdfinclude

I discovered the answer, which I thought I had already tried before asking this question. The answer is you need to pass the value "Yes" (with capital "Y") in order to check the checkbox. The correct code in this instance is:
put stream stream1 unform
"^global CHX_SINGLE_CE_PLAN3" skip(0)
"Yes" skip
I believe this is the case no matter which language you're using

How to vertically split a PDF e-book with collation (2-page per sheet to 1 sheet per page)

I have a scanned e-book with 2 pages per sheet. I was able to crop the e-book for the white borders on four sides. Since the two book sheets are on one single page, I am getting bad view on e-reader like kindle. I am trying to split the e-book to 1 page per sheet. Is there a way to to do this in acrobat professional?
I thought of cropping the pdf as two batches (left and right) and merging them together but the page collation will go off completely. the pages won't come adjacent to each other. I will get 1,3,5,7 upto 101 as one pdf and 2,4,6....100 as another PDF
pLEASE provide me a solution in acrobat professional

You can merge the PDFs back together in script. When running from Acrobat, JS has access to quite a few functions that aren't available in Reader.
doc.insertPages(nPageInDoc, pathToOtherPDF, nStartPage, nEndPage)
So you could create a script in a button in one of your 1,3,5,7... files to import all the pages from the other. Something like:
var oddPagesDoc = app.openDoc("c:\\oddPages.pdf");
var evenPagesDoc = app.openDoc("c:\\evenPages.pdf");
var evenPageCount = evenPagesDoc.numPages;
for (var i = 0; i < evenPageCount; ++i) {
oddPagesDoc.insertPages(i, "c:\\evenPages.pdf", i, i);
}
So insert a button into the "odd pages" file with the above script as the button's "mouse down" javascript action. Click. Delete the button.
It's entirely possible there's an "off by one" error in my script, so I don't recommend saving over the original until you're sure everything was assembled properly.

If you Adobe Acrobat Pro, there is a way to do this without scripting. It's quite tedious, but I'll explain:
I advise you to make a copy of the file first
Crop the left part of the page: Select Crop pages, if it's an A4 sheet then for Margin Controls use Right = 14.85, for Page Range select All. Save as left.pdf
Extract all the left pages as separate files: in the file left.pdf, select Extract pages, edit the From and To boxes to select all the pages, check the box Extract Pages As Separate Files. Now select a folder to save all the files in, it will name them left 1.pdf, left 2.pdf, left 3.pdf, etc
Repeat step 1 for the right side of the page: open original file, crop at Left = 14.85 for all pages and save as right.pdf
Repeat step 2 for right.pdf to extract all the pages into the same folder as right 1.pdf, right 2.pdf, right 3.pdf, etc
In Acrobat choose Create -> Combine Files into a Single PDF, and navigate to the folder where you've saved all the separate pages. You can rearrange the files into the correct order, i.e. left 1.pdf, right 1.pdf, left 2.pdf, right 2.pdf, etc. Then click Combine Files and save your new ebook.
Tip: if you have many pages in the book, it can take quite long to rearrange the files when combining them. Acrobat arranges them alphabetically, so it would be better if the files were named, for instance for a PDF with 354 pages, 001left.pdf, 002right.pdf, ..., 354left.pdf, 354right.pdf. I can't find any setting in Acrobat to change the default name. But you could use this free tool to batch rename files: http://www.snapfiles.com/get/denrenamer.html

The following modified script (idea based on the previous answer) worked for me:
for (var i = 0; i < pageCount; i++) {
this.insertPages({nPage:2*i,cPath:"/C/***/fileName.pdf",nStart: i,nEnd: i});
}
Multiplication by 2 is needed since when you're inserting pages, your page numbering shifts.

Using VBA in MS Word 2007 to define page elements?

I'd like to be able to create a page element which I can feed text and it will form itself into the preferred layout. For instance:
{MACRO DocumentIntro("Introduction to Business Studies", "FP015", "Teachers' Guide")}
with that as a field, the output should be a line, the first two strings a certain size and font, centred, another line and then the third string fonted, sized and centred.
I know that's sort of TeX-like and perhaps beyond the scope of VBA, but if anyone's got any idea how it might be possible, please tell!
EDIT:
Ok, if I put the required information into Keyword, as part of the document properties, with some kind of unique separator, then that gets that info in, and the info will be unique to each document. Next one puts a bookmark where the stuff is going to be displayed. Then one creates an AutoOpen macro that goes to that bookmark, pulls the relevants out of the keywords, and forms the text appropriately into the bookmark's .Selection.
Is that feasible?

You're certainly on the right track here for a coding solution. However, there is a simpler way with no code - this is the type of scenario that Content Controls in Word 2007 were built for and with Fields/Properties, you can bind to content controls (CC). These CC can hold styles (like centered, bold, etc.). No VBA required.
The very easiest thing to do is to pick 3 built-in document properties that you will always want these to be. For example, "Title" could be your first string, "Subject" your second string and "Keywords" your third. Then, just go to the Insert ribbon, Quick Parts, Document Properties and insert, place and format those how you like. Then go to Word's start button (the orb thingy) and then under Prepare choose Properties. Here you can type, for example "Introduction to Business Studies", into the Title box and then just deselect it somehow (like click in another box). The Content Control for Title will be filled in automatically with your text.
If you want to use this for multiple files, just create this file as a .dotx (after CC insertion/placement/formatting and before updating the Document Properties' text). Then every time all you'll have to do is set these three properties with each new file.

Well, yes, it did turn out to be feasible.
Sub autoopen()
Dim sKeywords As String
sKeywords = ActiveDocument.BuiltInDocumentProperties(4)
ActiveDocument.Bookmarks("foo").Select
Selection.Text = sKeywords
End Sub
Okay, I have some filling out to do, but at least the guts of it are there.

Why does every form field in my generated PDF end with "-0"?

So I have the following VB.NET code that creates a form field in a PDF using SyncFusion's Essential PDF module:
Dim pdfField As New Pdf.Interactive.PdfTextBoxField(pdfDoc.Pages(iPage), "txt1")
pdfField.Location = New PointF(50, 50)
pdfField.Size = New SizeF(100, 10)
pdfDoc.Form.Fields.Add(pdfField)
This works great except for one thing. When I open up the PDF in Acrobat and look at the field name I notice that it says "txt1-0". Now I can't figure out where the "-0" is coming from and how to get rid of it.
This may be a SyncFusion issue, in which case I hope I get an answer from them soon (I've asked this on their forum). But I thought it might also be a fundamental detail about PDF's and naming that I don't know about.

Ah ha, I just found out what was causing this.
Previously I was using both the PdfLoadedDocument and PdfDocument classes. I was loading the PdfLoadedDocument into the PdfDocument via ImportPages and apparently this process will add the "-0" suffix to the field names.
I found that in my case I can get rid of the PdfDocument object and just use PdfLoadedDocument and that fixed it.
UPDATE:
Just to expand on this, I've found that it's actually the PdfDocument.Form.FieldAutoNaming property that controls this. It's default value is true. And when it's set to true it'll automatically add suffixes as needed to prevent duplicate field names. But if you set it to false then it won't add the suffix "-0" anymore -- instead you might get errors in your code.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas