I have a PDF document that has several hundred fields. All of the field names have periods in them, such as "page1.line1.something"
I want to remove these periods and replace them with either an underscore or (better) nothing at all
There appears to be a bug in the itextsharp libraries where the renamefield method does not work if the field has a period, so the following does not work (always returns false)
Dim formfields As AcroFields = stamper.AcroFields
Dim renametest As Boolean
renametest = formfields.RenameField("page1.line1.something", "page1_line1_something")
If the field does not have a period in it, it works fine.
Has anyone come across this and is there a workaround?
Is this an AcroForm form or a LiveCycle Designer (xfa) form?
If it's XFA (which is likely given the field names), iText can't help you. It can only get/set field values when working with XFA.
Okay, an AcroForm. Rather than go the route used in your source, I suggest you directly manipulate the existing field dictionaries and the acroForm field list.
I'm a Java native when it comes to iText, so you'll have to do some translation, but here goes:
A) Delete the AcroForm's field array. Leave the calculation order alone if present (/CO). I think.
PdfDictionary acroDict = reader.getCatalog().getAsDictionary(PdfName.ACROFORM);
acroDict.remove(PdfName.FIELDS);
B) Attach all the 'top level' fields to a new FIELDS array.
PdfArray newFldArray = new PdfArray();
acroDict.put(newFldArray, PdfName.FIELDS);
// you could wipe this between pages to speed things up a bit
Set<PdfIndirectReference> radioFieldsAdded = new HashSet<PdfIndirectReference>();
int numPages = reader.getNumberOfPages();
for (int curPg = 1; curPg <= numPages; ++curPg) {
PdfDictionary curPageDict = reader.getPageN(curPg);
PdfArray annotArray = curPageDict.getAsArray(PdfName.ANNOTS);
if (annotArray == null)
continue;
for (int annotIdx = 0; annotIdx < annotArray.size(); ++annotIdx) {
PdfIndirectReference fieldReference = (PdfIndirectReference) annotArray.getAsIndirect(annotIdx);
PdfDictionary field = (PdfDictionary)PdfReader.getObject(fieldReference);
// if it's a radio button
if ((PdfFormField.FF_RADIO & field.getAsNumber(PdfName.FF).intValue()) != 0) {
fieldReference = field.get(pdfName.PARENT);
field = field.getAsDict(PdfName.PARENT); // looks up indirect reference for you.
// only add each radio field once.
if (radioFieldsAdded.contains(fieldReference)) {
continue;
} else {
radioFieldsAdded.add(fieldReference);
}
}
field.remove(PdfName.PARENT);
// you'll need to assemble the original field name manually and replace the bits
// you don't like. Parent.T + '.' child.T + '.' + ...
String newFieldName = SomeFunction(field);
field.put(PdfName.T, new PdfString( newFieldName ) );
// add the reference, not the dictionary
newFldArray.add(fieldReference)
}
}
C) Clean up
reader.removeUnusedObjects();
Disadvantage:
More Work.
Advantages:
Maintains all field types, attributes, appearances, and doesn't change the file as a whole all that much. Less CPU & memory.
Your existing code ignores field script, all the field flags (read only, hidden, required, multiline text, etc), lists/combos, radio buttons, and quite a few other odds and ends.
if you use periods in your field name, only the last part can be renamed, e.g. in page1.line1.something only "something" can be renamed. This is because the "page1" and "line1" are treated by adobe as parents to the "something" field
I needed to delete this hierarchy and replace it with a flattened structure
I did this by
creating a pdfdictionary object for each field
reading the annotations I needed for each field into an array
deleting the field hierarchy in my (pdfstamper) document
creating a new set of fields from my array data
I have created some sample code for this if you want to see how I did it.
Related
I've got a system that generates and automatically maintains lots of spreadsheets on a Drive account.
Whenever I add data to the sheet I run a 'format' method to pass over and make sure everything is ok.
This generally does things like:
set the default font and size across the sheet
set up the heading row
freeze rows
In addition, I have the code below to make sure the first two columns (index 0 and 1) in the sheet are autoresizing to fit their contents. when I run it though, this element doesn't seem to make a difference. The font, column freezes etc all work.
Other notes:
I only want those 2 columns to auto-resize
the amount of rows in a sheet can vary
this job is appended to the end of several in requestList
My code:
requestList.Requests.Add(new Google.Apis.Sheets.v4.Data.Request()
{
AutoResizeDimensions = new AutoResizeDimensionsRequest()
{
Dimensions = new DimensionRange()
{
SheetId = Convert.ToInt32(sheetId),
Dimension = "COLUMNS",
StartIndex = 0,
EndIndex = 1
}
}
});
var updateRequest = sheetService.Spreadsheets.BatchUpdate(requestList, spreadSheetId);
var updateResponse = updateRequest.Execute();
Could the order which I request the 'format' changes be affecting things maybe? Can anyone help?
As written in the documentation,
the start index is inclusive and the end index is exclusive.
So, For the first two columns, it should be
startIndex = 0,
endIndex = 2
I saw an example of extracting all available terms for a field here
The reason it doesn't fit my porpouses is because terms and stored values are different, e.g. stored value of "black cat" will be represnted as two terms "black" and "cat". in my code I need to extract whole stored values in this case "black cat".
Yes, you could do that. I'm not C# programmer, but hopefully you will understand Java code.
IndexReader reader = DirectoryReader.open(dir);
final int len = reader.maxDoc();
for (int i = 0; i < len; ++i) {
Document document = reader.document(i);
List<IndexableField> fields = document.getFields();
for (IndexableField field : fields) {
if (field.fieldType().stored()) {
System.out.println(field.stringValue());
}
}
}
So, basically, I'm traversing across all docs, get all fields, and if they are stored, get the data. You could filter it by the name of the field, that are needed for you.
Full test could be found here - https://raw.githubusercontent.com/MysterionRise/information-retrieval-adventure/master/src/main/java/org/mystic/GetAllStoredFieldValues.java (also with the proof, that it works correctly)
I have a docx file which contains a footnote. I have a placeholder in the footnote text that needs to be replaced. While extracting the nodes and modifying the textvalue that placeholder went unpassed. For some reason I think it is not picking up the text provided in the footnote.
Can u please guide me as to how u get to replace a placeholder in the footnote.
Approach 1
faster if you haven't yet caused unmarshalling to occur:
FootnotesPart fp = wordMLPackage.getMainDocumentPart().getFootnotesPart();
fp.variableReplace(mappings);
Approach 2
FootnotesPart fp = wordMLPackage.getMainDocumentPart().getFootnotesPart();
// unmarshallFromTemplate requires string input
String xml = XmlUtils.marshaltoString(fp.getJaxbElement(), true);
// Do it...
Object obj = XmlUtils.unmarshallFromTemplate(xml, mappings);
// Inject result into docx
fp.setJaxbElement((CTFootnotes) obj);
Since #JasonPlutext's answer did not work for my case I am posting what worked for me
FootnotesPart fp = template.getMainDocumentPart().getFootnotesPart();
List<Object> texts = fp.getJAXBNodesViaXPath("//w:t", true);
for(Object obj : texts) {
Text text = (Text) ((JAXBElement) obj).getValue();
String textValue = text.getValue();
// do variable replacement
text.setValue(textValue);
}
But still I face the issue when exporting this as pdf using Docx4J.toPDF(..);
The output does not pick up the footnote reference.
I am using PDF Sharp and have one issue only. I cannot rename form fields. We have a field called 'x' and after an operation is applied to field 'x', it needs to be renamed to field 'y'.
I have seen tons of documentation on how to do this using itextSharp. Unfortunately my firm cannot use them and so I am looking for a solution using PDF Sharp.
Any ideas?
This can give you an idea on how to perform the field renaming
var uniqueIndex = Guid.NewGuid();
var fields = pdfDocument.AcroForm.Fields;
var fieldNames = fields.Names;
for (int idx = 0; idx < fieldNames.Length; ++idx)
{
var fieldName = fieldNames[idx];
var field = fields[fieldName];
field.Elements.SetName($"/{fieldName}", $"{fieldName}_{uniqueIndex}");
}
I was able to rename form field via PdfSharp as follow:
public void RenameAcroField(PdfAcroField field, string newFieldName)
{
field.Elements.SetString("/T", newFieldName);
}
Little bit tricky but worked for my case. Hope it will help.
VB.NET version for PDFsharp 1.50.5147
Dim i = 0
While i < pdfDoc.AcroForm.Fields.Count
pdfDoc.AcroForm.Fields(i).Elements.SetString("/T", "formField" & i)
i += 1
End While
I'm using VTD-XML to update XML files. In this I am trying to get a flexible way of maintaining attributes on an element. So if my original element is:
<MyElement name="myName" existattr="orig" />
I'd like to be able to update it to this:
<MyElement name="myName" existattr="new" newattr="newValue" />
I'm using a Map to manage the attribute/value pairs in my code and when I update the XML I'm doing something like the following:
private XMLModifier xm = new XMLModifier();
xm.bind(vn);
for (String key : attr.keySet()) {
int i = vn.getAttrVal(key);
if (i!=-1) {
xm.updateToken(i, attr.get(key));
} else {
xm.insertAttribute(key+"='"+attr.get(key)+"'");
}
}
vn = xm.outputAndReparse();
This works for updating existing attributes, however when the attribute doesn't already exist, it hits the insert (insertAttribute) and I get "ModifyException"
com.ximpleware.ModifyException: There can be only one insert per offset
at com.ximpleware.XMLModifier.insertBytesAt(XMLModifier.java:341)
at com.ximpleware.XMLModifier.insertAttribute(XMLModifier.java:1833)
My guess is that as I'm not manipulating the offset directly this might be expected. However I can see no function to insert an an attribute at a position in the element (at end).
My suspicion is that I will need to do it at the "offset" level using something like xm.insertBytesAt(int offset, byte[] content) - as this is an area I have needed to get into yet is there a way to calculate the offset at which I can insert (just before the end of the tag)?
Of course I may be mis-using VTD in some way here - if there is a better way of achieving this then happy to be directed.
Thanks
That's an interesting limitation of the API I hadn't encountered yet. It would be great if vtd-xml-author could elaborate on technical details and why this limitation exists.
As a solution to your problem, a simple approach would be to accumulate your key-value pairs to be inserted as a String, and then to insert them in a single call after your for loop has terminated.
I've tested that this works as per your code:
private XMLModifier xm_ = new XMLModifier();
xm.bind(vn);
String insertedAttributes = "";
for (String key : attr.keySet()) {
int i = vn.getAttrVal(key);
if (i!=-1) {
xm.updateToken(i, attr.get(key));
} else {
// Store the key-values to be inserted as attributes
insertedAttributes += " " + key + "='" + attr.get(key) + "'";
}
}
if (!insertedAttributes.equals("")) {
// Insert attributes only once
xm.insertAttribute(insertedAttributes);
}
This will also work if you need to update the attributes of multiple elements, simply nest the above code in while(autoPilot.evalXPath() != -1) and be sure to set insertedAttributes = ""; at the end of each while loop.
Hope this helps.