Why is PDF form information stored on both 'Root.AcroForm.Fields' & 'Root.Pages.Kids[0].Annots' - pdf

If I update the value of a form in either of these locations, both are affected. Why are they stored twice?
When updating these forms, is one preferred to be used over the other one?
(I'm using Python library pdfrw)
'/Root':{
'/AcroForm': {'/Fields': [(10, 0), (11, 0)] },
'/Pages': { '/Kids': [ {'/Annots': [(10, 0), (11, 0)] }] }
}
EDIT

The AcroForm dictionary references all abstract form fields (directly or indirectly) to allow immediate access to all fields of a document.
Each abstract form field may have any number of widget annotations (except signature fields with at most one annotation).
Widget annotations are for displaying the form field contents. Thus, they must be attached to the page they respectively are displayed upon. So they are referenced from the Annots of the respective page.
If a form field has no widget annotation, you cannot find it from any page.
If a form field has exactly one widget annotation, you can usually find it from exactly one page, the page that annotation is on. In this case the form field object and the widget annotation object may be merged into a single object.
If a form field has more widget annotations, you can usually find it on one or more pages, depending on whether all those annotations are on the same or one different pages.
Thus,
Why are they stored twice?
They are not stored twice, each form field is stored only once, in one PDF object. But that form field object can usually be reached from multiple locations in the object model, from the global AcroForm object and from the Annots of each page that form field has a widget on.

Related

PDF form fields: Separate/Extract widget dictionary from field dictionary

According to the PDF spec it is possible to merge the widget dictionary and the field dictionary when there is only one associated widget annotation. Is there some support by iText / openPDF to separate the two again? (Low level API would suffice).
Update: Ok so there seems to be no convienient method for it. But what about the following entries which does exist in both dictionaries:
AA (additional actions) are defined in (widget) annotation dictionary and in the field dictionary - so when separating where to put it?
Parent - both field and annotation define a parent - so when separating where to put it?

iTextSharp PdfCopy makes read-only fields editable

I'm working on some code that concatenates PDF files using iTextSharp. I'm having a problem with a particular PDF that contains some read-only fields and a field that is editable (I believe they're AcroFields). In the output file all of the fields are editable.
Here is the code that I use (I've simplified it to read only one PDF):
public static void Concat(string outputFilePath, string inputFilePath)
{
using (var document = new Document())
{
using (var fileStream = new FileStream(outputFilePath, FileMode.Create, FileAccess.ReadWrite))
using (var copier = new PdfCopy(document, fileStream))
{
copier.SetMergeFields();
document.Open();
var reader = new PdfReader(inputFilePath);
copier.AddDocument(reader);
copier.AddJavaScript(reader.JavaScript);
copier.Close();
}
document.Close();
}
}
Any ideas on how to preserve the attributes of the fields?
It looks like iText and Adobe Reader interpret the form field structure differently. E.g. look at this parent field with one child:
(Object 24 is referenced from the AcroForm dictionary Fields array. Object 130 is referenced from the Page dictionary ANNOTS array.)
So we have two field objects named PageDataCollection1[0].txtCity, the objects 24 and 130, the widget annotation being merged into the latter.
iText considers the terminal field object (object 130) to be completely in charge, using its Ff value 0 which among other things means not read-only.
Adobe Reader, on the other hand, considers the terminal field object (object 130) only to be partially in charge, using its DA value but not its Ff value. Instead the parent Ff value 1 is used which among other things means read-only.
In the course of copying the document pages, the hierarchies are flattened making the different interpretation visible.
Ad hoc I would say the behavior of iText is correct here.
The behavior of Adobe Reader might be justified with this section from the specification ISO 32000-1:
It is possible for different field dictionaries to have the same fully qualified field name if they are descendants of a common ancestor with that name and have no partial field names (T entries) of their own. Such field dictionaries are different representations of the same underlying field; they should differ only in properties that specify their visual appearance. In particular, field dictionaries with the same fully qualified field name shall have the same field type (FT), value (V), and default value (DV).
(section 12.7.3.2 Field Names)
Maybe Adobe Reader tries to enforce that different representations of the same field only differ in properties that specify their visual appearance, by ignoring other properties in descendant fields without partial field names.
As there are no different representations of the field, though, there is no need for this measure here.
There is an alternative interpretation of the object structure here, #rhens proposed
There aren't 2 fields with the same name: object 24 is the field dictionary, object 130 is the widget annotation.
IMO this interpretation does not match the PDF specification even though it would explain the behavior of Adobe Reader.
While indeed the Kids array of a form field may contain either child fields or widgets, the object 130 in my opinion has to be considered a field (which has its own widget merged into itself) rather than a widget of field object 24.
To check whether some kid dictionary object is a child field or merely a widget, it does not suffice to find widget-specific entries in the kid: such entries can also be in a child field which has its single widget merged into itself. Thus, one instead has to check for field-specific entries in the kid.
In the case at hand the kid object 130 does have field-specific entries (foremost the field type FT but also the field flags Ff) and, therefore, should be considered a child field.
That all been said, it indeed is possible that Adobe does consider that object a mere widget (which, as mentioned above, would explain the behavior). This interpretation would not be inspired by the specification, though, as explained above. But it might be inspired by a non-negligible amount of documents from the wild which erroneously have additional field-specific entries in their plain widgets and require this interpretation to be displayed as designed.

Concrete form bean for html select

I have used <html:select> in one of my jsp pages with multiple selection. What varibles, for storing the selection, should i have in the form bean associated with this list?
Straight from the documentation:
This tag operates in two modes, depending upon the state of the
multiple attribute, which affects the data type of the associated
property you should use:
multiple="true" IS NOT selected - The corresponding property should be a scalar value of any supported data type.
multiple="true" IS selected - The corresponding property should be an array of any supported data type.
(emphasis mine)

Initialize a Struts 1 ActionForm with Data from HttpSession Object

I've done this a half dozen times so I know it's possible. I just can't remember how.
I would like to initialize a property of a Struts 1 ActionForm with data from the user's HttpSession object, but only when the form is first created. Actually don't worry too much about the fact that it comes from HttpSession, important is just the fact that the data is dynamic, per-user, and should only be initialized once.
Additionally, if the user changes the data in this field, the user's entry should persist. In other words, when the user first sees the form they will see the initialized data. If they then change the field and submit the form (by calling the associated action) and subsequently come back to this form later, they should see THEIR entry in that field.
Obviously initializing the field in struts-config.xml won't work because the data is dynamic and per-user. Same can be said for the form's constructor. I see the reset() method of ActionForm will be called to reset properties to a default state, but I don't remember if it is called before the first time the form is loaded and displayed in the page. I suppose if it is that's an option, but I would only want it to do the initialization on the first call. That sounds just mildly complicated (I would need a boFirstTime member variable flag or something?).
Can anyone help?
What I ended up doing was overriding reset() of ActionForm, and setting the desired property only if it is null or blank. The property I needed to initialize is represented in the class member variable _strMailTo (yeah I know nobody but me uses the underscores for member variables anymore).
It turns out that reset() is also called before the ActionForm properties are used for the first time to populate the fields of the form for the associated Action. In this way the first time the user sees the form the my pre-populated data is there. But if they change it and later land on the form again they see the text they put in the field the last time they submitted the form.
I guess maybe I'm also the only one still using Struts 1 anymore...
public void reset(ActionMapping mapping, HttpServletRequest request) {
if (_strMailTo == null || _strMailTo.equals("")) {
String strRemoteUser = request.getRemoteUser();
_strMailTo = chOps.UtilityUsers.getEmail(strRemoteUser);
}
}

How to use GtkTreeView correctly

I am using a TreeView with a ListStore as model. When the user clicks on a row I want to take some action but not using the values in the cells, but using the data I created the row from...
Currently I have the TreeView, the TreeModel (ListStore) and my own data (which I ironically call model)..
So the Questions are:
Is it "right" to have a model - an object representation of the data I want to display and fill a ListStore with that data to display in a TreeView, or would it be better to implement an own version of TreeModel (wrapping my data-model) to display the data?
And also:
If someone double-clicks in a row I can get the RowActivated event (using C#/Gtk#) which provides a Path to the activated row. With that I can get a TreeIter and using that I can get the value of a cell. But what is the best practice to find the data object from which the row was constructed in the first place?\
(Somehow this question got me to the first one - by thinking would getting the data object more easy if I tried to implement my own TreeModel...)
It's quite awkward/difficult to implement TreeModel, so most people simply synch the data from their "real" model into a TreeStore or ListStore.
The columns in the store do not have to match the columns in the view in any way. For example, you can have a column that contains your real managed data objects.
When you add a cellrenderer to a TreeView (visual) column, you can add mappings between its properties and the columns of the store. For example, you could map one store column to the font of a text cellrenderer, and another store column to the text property of the same cellrenderer. Each time the cellrenderer is used to render a particular cell, the mappings will be used to retrieve the values from the store and apply them to the properties of the renderer before it renders.
Here's an example of a mapping:
treeView.AppendColumn ("Title", renderer, "text", 0, "editable", 4);
This maps store column 0 to the renderer's text GTK property and maps store column 4 to the editable property. For GTK property names you can check the GTK docs. Note that the example above uses a convenience method that adds a column, adds a renderer to it and add an arbitrary number of mapping via params. To add mappings directly to a column, for example a column with multiple renderers, pack the renderers into the column then use TreeViewColumn.AddAttribute or TreeViewColumn.SetAttributes.
You can also set up a custom data function that will be used instead of mappings. This allows you to set the properties of the renderer directly, given a TreeIter and the store - so, if all the data you want to display is trivially derived from your real data objects, you could even have your store only contain a single column of these objects, and use data funcs for all the view columns.
Here's an example of a data func that does exactly what the mapping example above does:
treeColumn.SetCellDataFunc (renderer, delegate (TreeViewColumn col,
CellRenderer cell, TreeModel model, TreeIter iter)
{
var textCell = (CellRendererText) cell;
textCell.Text = (string) model.GetValue (iter, 0);
textCell.Editable = (bool) model.GetValue (iter, 4);
});
Obviously data functions are much more powerful because they enable you not only to use properties of more complex GTK objects, but also to implement more complex display logic - for example, lazily processing derived values only when the cell is actually rendered.