pdfbox cannot find symbol - pdf

I'm using code suggested a while ago by #ASu :
package pdf_form_filler;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDDocumentCatalog;
import org.apache.pdfbox.pdmodel.interactive.form.*;
import java.io.File;
import java.util.*;
public class pdf_form_filler {
public static void listFields(PDDocument doc) throws Exception {
PDDocumentCatalog catalog = doc.getDocumentCatalog();
PDAcroForm form = catalog.getAcroForm();
List<PDFieldTreeNode> fields = form.getFields();
for(PDFieldTreeNode field: fields) {
Object value = field.getValue();
String name = field.getFullyQualifiedName();
System.out.print(name);
System.out.print(" = ");
System.out.print(value);
System.out.println();
}
}
public static void main(String[] args) throws Exception {
File file = new File("test.pdf");
PDDocument doc = PDDocument.load(file);
listFields(doc);
}
}
However, i keep getting a cannot find symbol error for PDFieldTreeNode. I have the latest pdfbox (2.0.4) but I cannot find the class in it anyway. I tried using this with PDField instead but then get error for .getValue

To get all terminal fields, do this:
Iterator<PDField> fieldIterator = catalog.getAcroForm();
while (fieldIterator.hasNext())
{
PDField pdField = fieldIterator.next();
// do stuff with the field
}
form.getFields() only gets the top level fields, including nonterminal fields (i.e. fields with no value but children).
A more advanced example can be found in the PrintFields.java example.

Related

Cleaning up unused images in PDF page resources

Please forgive me if this has been asked but I have not found any matches yet.
I have some PDF files where images are duplicated on each page's resources but never used in its content stream. I think this is causing the PDFSplit command to create very bloated pages. Is there any utility code or examples to clean up unused resources like this? Maybe a starting point for me to get going?
I was able to clean up the resources for each page by gathering a list of the images used inside the page's content stream. With the list of images, I then check the resources for the page and remove any that weren't used. See the PageExtractor.stripUnusedImages below for implementation details.
The resource object was shared between pages so I also had to make sure each page had its own copy of the resource object before removing images. See PageExtractor.copyResources below for implementation details.
The page splitter:
package org.apache.pdfbox.examples;
import org.apache.pdfbox.contentstream.operator.Operator;
import org.apache.pdfbox.cos.COSBase;
import org.apache.pdfbox.cos.COSDictionary;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.PDResources;
import org.apache.pdfbox.pdmodel.interactive.action.PDAction;
import org.apache.pdfbox.pdmodel.interactive.action.PDActionGoTo;
import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotation;
import org.apache.pdfbox.pdmodel.interactive.annotation.PDAnnotationLink;
import org.apache.pdfbox.pdmodel.interactive.documentnavigation.destination.PDDestination;
import org.apache.pdfbox.pdmodel.interactive.documentnavigation.destination.PDPageDestination;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.IOException;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
public class PageExtractor {
private final Logger log = LoggerFactory.getLogger(this.getClass());
public PDDocument extractPage(PDDocument source, Integer pageNumber) throws IOException {
PDDocument targetPdf = new PDDocument();
targetPdf.getDocument().setVersion(source.getVersion());
targetPdf.setDocumentInformation(source.getDocumentInformation());
targetPdf.getDocumentCatalog().setViewerPreferences(source.getDocumentCatalog().getViewerPreferences());
PDPage sourcePage = source.getPage(pageNumber);
PDPage targetPage = targetPdf.importPage(sourcePage);
targetPage.setResources(sourcePage.getResources());
stripUnusedImages(targetPage);
stripPageLinks(targetPage);
return targetPdf;
}
/**
* Collect the images used from a custom PDFStreamEngine (BI and DO operators)
* Create an empty COSDictionary
* Loop through the page's XObjects that are images and add them to the new COSDictionary if they were found in the PDFStreamEngine
* Assign the newly filled COSDictionary to the page's resource as COSName.XOBJECT
*/
protected void stripUnusedImages(PDPage page) throws IOException {
PDResources resources = copyResources(page);
COSDictionary pageObjects = (COSDictionary) resources.getCOSObject().getDictionaryObject(COSName.XOBJECT);
COSDictionary newObjects = new COSDictionary();
Set<String> imageNames = findImageNames(page);
Iterable<COSName> xObjectNames = resources.getXObjectNames();
for (COSName xObjectName : xObjectNames) {
if (resources.isImageXObject(xObjectName)) {
Boolean used = imageNames.contains(xObjectName.getName());
if (used) {
newObjects.setItem(xObjectName, pageObjects.getItem(xObjectName));
} else {
log.info("Found unused image: name={}", xObjectName.getName());
}
} else {
newObjects.setItem(xObjectName, pageObjects.getItem(xObjectName));
}
}
resources.getCOSObject().setItem(COSName.XOBJECT, newObjects);
page.setResources(resources);
}
/**
* It is necessary to copy the page's resources since it can be shared with other pages. We must ensure changes
* to the resources are scoped to the current page.
*/
protected PDResources copyResources(PDPage page) {
return new PDResources(new COSDictionary(page.getResources().getCOSObject()));
}
protected Set<String> findImageNames(PDPage page) throws IOException {
Set<String> imageNames = new HashSet<>();
PdfImageStreamEngine engine = new PdfImageStreamEngine() {
#Override
void handleImage(Operator operator, List<COSBase> operands) {
COSName name = (COSName) operands.get(0);
imageNames.add(name.getName());
}
};
engine.processPage(page);
return imageNames;
}
/**
* Borrowed from PDFBox page splitter
*
* #see org.apache.pdfbox.multipdf.Splitter#processAnnotations(org.apache.pdfbox.pdmodel.PDPage)
*/
protected void stripPageLinks(PDPage imported) throws IOException {
List<PDAnnotation> annotations = imported.getAnnotations();
for (PDAnnotation annotation : annotations) {
if (annotation instanceof PDAnnotationLink) {
PDAnnotationLink link = (PDAnnotationLink) annotation;
PDDestination destination = link.getDestination();
if (destination == null && link.getAction() != null) {
PDAction action = link.getAction();
if (action instanceof PDActionGoTo) {
destination = ((PDActionGoTo) action).getDestination();
}
}
if (destination instanceof PDPageDestination) {
// TODO preserve links to pages within the splitted result
((PDPageDestination) destination).setPage(null);
}
}
// TODO preserve links to pages within the splitted result
annotation.setPage(null);
}
}
}
The stream reader used to analyze the page's images:
package org.apache.pdfbox.examples;
import org.apache.pdfbox.contentstream.PDFStreamEngine;
import org.apache.pdfbox.contentstream.operator.Operator;
import org.apache.pdfbox.contentstream.operator.OperatorProcessor;
import org.apache.pdfbox.cos.COSBase;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.pdmodel.graphics.PDXObject;
import org.apache.pdfbox.pdmodel.graphics.form.PDFormXObject;
import org.apache.pdfbox.pdmodel.graphics.form.PDTransparencyGroup;
import java.io.IOException;
import java.util.List;
abstract public class PdfImageStreamEngine extends PDFStreamEngine {
PdfImageStreamEngine() {
addOperator(new DrawObjectCounter());
}
abstract void handleImage(Operator operator, List<COSBase> operands);
protected class DrawObjectCounter extends OperatorProcessor {
#Override
public void process(Operator operator, List<COSBase> operands) throws IOException {
if (operands != null && isImage(operands.get(0))) {
handleImage(operator, operands);
}
}
protected Boolean isImage(COSBase base) throws IOException {
if (!(base instanceof COSName)) {
return false;
}
COSName name = (COSName)base;
if (context.getResources().isImageXObject(name)) {
return true;
}
PDXObject xObject = context.getResources().getXObject(name);
if (xObject instanceof PDTransparencyGroup) {
context.showTransparencyGroup((PDTransparencyGroup)xObject);
} else if (xObject instanceof PDFormXObject) {
context.showForm((PDFormXObject)xObject);
}
return false;
}
#Override
public String getName() {
return "Do";
}
}
}

Why don't the iText 7 form field values print?

I have a PDF with forms that was generated with iText 7. However, when I fill out the form and attempt to print, the form values do not show up; only the form outline shows. How do I generate form fields that will display the value when printed?
A sample form generator:
import java.io.FileNotFoundException;
import java.io.IOException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import com.itextpdf.forms.PdfAcroForm;
import com.itextpdf.forms.fields.PdfFormField;
import com.itextpdf.forms.fields.PdfTextFormField;
import com.itextpdf.kernel.font.PdfFont;
import com.itextpdf.kernel.font.PdfFontFactory;
import com.itextpdf.kernel.geom.PageSize;
import com.itextpdf.kernel.pdf.PdfDocument;
import com.itextpdf.kernel.pdf.PdfWriter;
import com.itextpdf.layout.Document;
import com.itextpdf.layout.border.Border;
import com.itextpdf.layout.element.Cell;
import com.itextpdf.layout.element.Table;
import com.itextpdf.layout.renderer.CellRenderer;
import com.itextpdf.layout.renderer.DrawContext;
/**
* #author Lucas Vander Wal
*
*/
public final class TestForm {
private static final Logger log = LoggerFactory.getLogger(TestForm.class);
private static final PageSize LETTER_SIZE = new PageSize(612, 792),
LETTER_SIZE_LANDSCAPE = LETTER_SIZE.rotate();
private static final PdfFont FONT;
static {
try {
FONT = PdfFontFactory.createFont();
} catch (IOException e) {
throw new RuntimeException(e);
}
}
private static final float FONT_SIZE_12 = 12;
/**
* #param args
*/
public static void main(String[] args) {
log.info("Running pdf generation...");
new TestForm().generate("formTest.pdf");
log.info("Done with pdf generation.");
}
private Document doc;
private PdfDocument pdfDoc;
private PdfAcroForm form;
public TestForm() {
}
/**
* Generates the timesheet pdf and saves it to the given file location
*
* #param outputFile
*/
public void generate(String outputFile) {
try {
pdfDoc = new PdfDocument(new PdfWriter(outputFile));
doc = new Document(pdfDoc, LETTER_SIZE_LANDSCAPE);
// set document properties
doc.setFontSize(10);
float marginSize = doc.getTopMargin();
doc.setMargins(marginSize / 2, marginSize, marginSize / 2, marginSize);
form = PdfAcroForm.getAcroForm(pdfDoc, true);
// build the form
buildUserInfo();
// close the document
doc.close();
} catch (FileNotFoundException e) {
log.warn("Unable to save to file: " + outputFile, e);
}
}
private void buildUserInfo() {
// build the user info table
Table userTable = new Table(4).setBorder(Border.NO_BORDER).setWidthPercent(100).setMargin(0).setPadding(0);
// add 4 text entry fields
this.addFieldToTable(userTable, "nameEmployee", "Name of Employee:");
// add the table to the document
this.doc.add(userTable);
}
private void addFieldToTable(Table t, String fieldName, String fieldLabel) {
t.addCell(new Cell().add(fieldLabel).setPadding(5));
Cell cell = new Cell().setPadding(5);
cell.setNextRenderer(new CellRenderer(cell) {
#Override
public void draw(DrawContext drawContext) {
super.draw(drawContext);
PdfTextFormField field = PdfFormField.createText(
drawContext.getDocument(), getOccupiedAreaBBox().decreaseHeight(5), fieldName, "",
FONT, FONT_SIZE_12);
form.addField(field);
}
});
t.addCell(cell);
}
}
PDF form fields have a visibility setting that affects the ability to print. See this Adobe page for more information.
The form fields need to be set to 'VISIBLE', using the setVisibility() method on PdfFormField objects. This iText 7 documentation explains how to set forms to 'HIDDEN','VISIBLE_BUT_DOES_NOT_PRINT', and 'HIDDEN_BUT_PRINTABLE', however it doesn't mention setting to visible, and in fact, visible is not one of the static options in PdfFormField.
The three values listed above correspond to the integers 1, 2, and 3, and on a hunch, I set the visibility to 0, and the form field displayed when printed!
In summary, call PdfFormField.setVisibility(0) to create form fields that are printable.

Convert Byte array to pdf for printing

I have a byte array and I need to send this byte array over the print server socket for printing as pdf. How do I convert this byte array into pdf bytes ?.
Basically, I created a pdf template and using PdfStamper, I am generating the pdf
PdfStamper stamper = new PdfStamper(pdfTemplate, out);
and then convering the ByteArrayOutputStream to byte array (out.toByteArray()). I am sending this byte array to another service which just picks the byte array and send it over the socket for printing. I tried to print this and this prints nothing but a blank page.
I guess, the printer is not recognizing this as a pdf. How do I tell printer that my byte array is pdf (How do I convert the byte array to pdf printable format)
calling this from method where I m passing my byte array and
DocFlavor flavor = DocFlavor.BYTE_ARRAY.AUTOSENSE;
You have to call this method by passing flavor and byte array Like this
new PrintTest().print(byteArray, flavor);
import java.awt.Graphics;
import java.awt.print.PageFormat;
import java.awt.print.Printable;
import java.awt.print.PrinterException;
import java.awt.print.PrinterJob;
import java.io.IOException;
import java.io.InputStream;
import java.io.Reader;
import java.util.Set;
import javax.print.Doc;
import javax.print.DocFlavor;
import javax.print.DocPrintJob;
import javax.print.PrintException;
import javax.print.PrintService;
import javax.print.SimpleDoc;
import javax.print.attribute.DocAttributeSet;
import javax.print.attribute.standard.PrinterStateReason;
import javax.print.attribute.standard.PrinterStateReasons;
import javax.print.attribute.standard.Severity;
import javax.print.event.PrintJobAttributeEvent;
import javax.print.event.PrintJobAttributeListener;
import javax.print.event.PrintJobEvent;
import javax.print.event.PrintJobListener;
import javax.print.event.PrintServiceAttributeEvent;
import javax.print.event.PrintServiceAttributeListener;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
public class PrintTest implements PrintServiceAttributeListener,PrintJobListener,Doc, Printable, PrintJobAttributeListener {
protected Logger logger = LoggerFactory.getLogger(PrintTest.class);
public void print(final byte[] byteArray, final DocFlavor flavor) {
Thread newThread = new Thread(new Runnable() {
public void run() {
PrintService ps = PrinterJob.getPrinterJob().getPrintService();
ps.addPrintServiceAttributeListener(PrintTest.this);
DocPrintJob docJob = ps.createPrintJob();
docJob.addPrintJobAttributeListener(PrintTest.this, null);
docJob.addPrintJobListener(PrintTest.this);
Doc document = new SimpleDoc(byteArray, flavor, null);
try {
docJob.print(document,null);
}
catch (PrintException e) {
e.printStackTrace(); //To change body of catch statement use File | Settings | File Templates.
}
}
});
newThread.start();
/**
PrintServiceAttributeSet attSet = ps.getAttributes();
PrinterStateReasons psr = ps.getAttribute(PrinterStateReasons.class);
if (psr != null) {
Set<PrinterStateReason> errors = psr.printerStateReasonSet(Severity.REPORT);
for (PrinterStateReason reason : errors)
System.out.printf(" Reason : %s",reason.getName());
logger.info();
} */
}
public void attributeUpdate(PrintServiceAttributeEvent psae) {
logger.info("attributeUpdate: "+psae.getAttributes());
}
public void printDataTransferCompleted(PrintJobEvent pje) {
logger.info("Transfer completed");
}
public void printJobCompleted(PrintJobEvent pje) {
logger.info("Completed");
}
public void printJobFailed(PrintJobEvent pje) {
logger.info("Failed");
PrinterStateReasons psr = pje.getPrintJob().getPrintService().getAttribute(PrinterStateReasons.class);
if (psr != null) {
Set<PrinterStateReason> errors = psr.printerStateReasonSet(Severity.REPORT);
for (PrinterStateReason reason : errors)
logger.info(" Reason : %s",reason.getName());
logger.info("\n");
}
}
public void printJobCanceled(PrintJobEvent pje) {
logger.info("Canceled");
}
public void printJobNoMoreEvents(PrintJobEvent pje) {
logger.info("No more events");
logger.info("printJobNoMoreEvents: "+pje.getPrintEventType());
}
public void printJobRequiresAttention(PrintJobEvent pje) {
logger.info("Job requires attention");
PrinterStateReasons psr = pje.getPrintJob().getPrintService().getAttribute(PrinterStateReasons.class);
if (psr != null) {
Set<PrinterStateReason> errors = psr.printerStateReasonSet(Severity.REPORT);
for (PrinterStateReason reason : errors)
logger.info(" Reason : %s",reason.getName());
logger.info("\n");
}
}
public DocFlavor getDocFlavor() {
return DocFlavor.SERVICE_FORMATTED.PRINTABLE; //To change body of implemented methods use File | Settings | File Templates.
}
public Object getPrintData() throws IOException {
return this;
}
public DocAttributeSet getAttributes() {
return null; //To change body of implemented methods use File | Settings | File Templates.
}
public Reader getReaderForText() throws IOException {
return null; //To change body of implemented methods use File | Settings | File Templates.
}
public InputStream getStreamForBytes() throws IOException {
return null; //To change body of implemented methods use File | Settings | File Templates.
}
public int print(Graphics graphics, PageFormat pageFormat, int pageIndex) throws PrinterException {
return pageIndex == 0 ? PAGE_EXISTS : NO_SUCH_PAGE; //To change body of implemented methods use File | Settings | File Templates.
}
public void attributeUpdate(PrintJobAttributeEvent pjae) {
logger.info("Look out");
}
}

How to get path of current selected file in Eclipse plugin development

I am opening an Editor with Open with menu in Eclipse.But i am not able to get path of current selected file.Sometimes it gives proper path but sometimes throws null pointer Exception.
I am writing following code to get selected file path.
IWorkbenchPage iwPage=PlatformUI.getWorkbench().getActiveWorkbenchWindow().getActivePage();
System.err.println("iwpage::"+iwPage);
ISelection selection=iwPage.getSelection();
System.err.println("selection::::testtttt"+selection.toString());
if(selection!=null && selection instanceof IStructuredSelection)
{
IStructuredSelection selectedFileSelection = (IStructuredSelection) selection;
System.out.println(selection.toString());
Object obj = selectedFileSelection.getFirstElement();
selectedFile=(IResource)obj;
System.err.println("selection::::"+selectedFile.getLocation().toString());
String html=selectedFile.getLocation().toString().replace(" ","%20");
String html_file="file:///"+html;
return html_file;
}
I found an answer in the Eclipse forum that seems easier and works for me so far.
IWorkbench workbench = PlatformUI.getWorkbench();
IWorkbenchWindow window =
workbench == null ? null : workbench.getActiveWorkbenchWindow();
IWorkbenchPage activePage =
window == null ? null : window.getActivePage();
IEditorPart editor =
activePage == null ? null : activePage.getActiveEditor();
IEditorInput input =
editor == null ? null : editor.getEditorInput();
IPath path = input instanceof FileEditorInput
? ((FileEditorInput)input).getPath()
: null;
if (path != null)
{
// Do something with path.
}
Some of those classes required new project references, so here's a list of all my imports for that class. Not all of them are related to this snippet, of course.
import org.eclipse.core.runtime.FileLocator;
import org.eclipse.core.runtime.IPath;
import org.eclipse.core.runtime.Path;
import org.eclipse.jface.text.ITextViewer;
import org.eclipse.jface.text.source.CompositeRuler;
import org.eclipse.jface.text.source.LineNumberRulerColumn;
import org.eclipse.swt.widgets.Composite;
import org.eclipse.swt.widgets.Control;
import org.eclipse.ui.IEditorInput;
import org.eclipse.ui.IEditorPart;
import org.eclipse.ui.IWorkbench;
import org.eclipse.ui.IWorkbenchPage;
import org.eclipse.ui.IWorkbenchWindow;
import org.eclipse.ui.PlatformUI;
import org.eclipse.ui.part.FileEditorInput;
import org.osgi.framework.Bundle;
You can ask the active editor for the path of the underlying file. Just register an IPartListener to your active IWorkbenchPage and ask withing this listener when activating a part. Here is a snippet
PlatformUI.getWorkbench().getActiveWorkbenchWindow().getActivePage()
.addPartListener(new IPartListener() {
#Override
public void partOpened(IWorkbenchPart part) {
// TODO Auto-generated method stub
}
#Override
public void partDeactivated(IWorkbenchPart part) {
// TODO Auto-generated method stub
}
#Override
public void partClosed(IWorkbenchPart part) {
// TODO Auto-generated method stub
}
#Override
public void partBroughtToTop(IWorkbenchPart part) {
if (part instanceof IEditorPart) {
if (((IEditorPart) part).getEditorInput() instanceof IFileEditorInput) {
IFile file = ((IFileEditorInput) ((EditorPart) part)
.getEditorInput()).getFile();
System.out.println(file.getLocation());
}
}
}
#Override
public void partActivated(IWorkbenchPart part) {
// TODO Auto-generated method stub
}
});

AnalyzerUtil error on Lucene

I'm learning to work with lucene. I wrote a simple program to test lucene analyzers like:
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.analysis.WhitespaceAnalyzer;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.analysis.StopAnalyzer;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.util.Version;
import org.apache.lucene.wordnet.AnalyzerUtils;
import java.io.IOException;
public class AnalyzerDemo {
private static final String[] examples = {
"The quick brown fox jumped over the lazy dog",
"XY&Z Corporation - xyz#example.com"
};
private static final Analyzer[] analyzers = new Analyzer[] {
new WhitespaceAnalyzer(),
new SimpleAnalyzer(),
new StopAnalyzer(Version.LUCENE_30),
new StandardAnalyzer(Version.LUCENE_30)
};
public static void main(String[] args) throws IOException {
String[] strings = examples;
if (args.length > 0) {
strings = args;
}
for (String text : strings) {
analyze(text);
}
}
private static void analyze(String text) throws IOException {
System.out.println("Analyzing \"" + text + "\"");
for (Analyzer analyzer : analyzers) {
String name = analyzer.getClass().getSimpleName();
System.out.println(" " + name + ":");
System.out.print(" ");
AnalyzerUtils.displayTokens(analyzer, text);
System.out.println("\n");
}
}
}
but I got the following error:
AnalyzerDemo.java:7: package org.apache.lucene.wordnet does not exist
import org.apache.lucene.wordnet.AnalyzerUtils;
^
AnalyzerDemo.java:35: cannot find symbol
symbol : variable AnalyzerUtils
location: class AnalyzerDemo
AnalyzerUtils.displayTokens(analyzer, text);
^
Note: AnalyzerDemo.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
2 errors
I think library wordnet or AnalyzerUtils is not available. How can I install this part of lucene? Do you have any ideas? Why is that missing? I've installed lucene 3.5.0.
lucene-wordnet contrib module was removed in Lucene 3.4.0. AnalyzerUtils also doesn't exist, so you either have to get Lucene 3.3.0 or write your own for 3.5.0 based on this one.
In case of word-net, the word-net contrib was removed from lucene 3.4.0 and the functionality was merged with analyzers contrib. Point number 4 at: http://apache.spinellicreations.com/lucene/java/3.4.0/changes-3.4.0/Contrib-Changes.html#3.4.0.new_features.
Java docs can be found for the same at: https://lucene.apache.org/core/old_versioned_docs/versions/3_5_0/api/all/org/apache/lucene/analysis/synonym/SynonymFilter.html
Instead of AnalyzerUtils.displayTokens(analyzer,text);
Use the function:
private static void displayTokens(Analyzer analyzer,String text) throws IOException
{
TokenStream stream=analyzer.tokenStream(null,new StringReader(text));
CharTermAttribute cattr = stream.addAttribute(CharTermAttribute.class);
stream.reset();
while (stream.incrementToken()){
System.out.print(cattr.toString()+" ");
}
stream.end();
stream.close();
}