Attempting to write my first class with docx4j (http://www.docx4java.org). Basically the idea is to find a string of text in the .docx file and replace it with another string of text. Essentially a mail merge. While I'm not receiving any errors, the merged document itself is not being saved in the path I've suggested. This makes me think it's a file path problem but I don't see anything wrong with it.
package efi.mailmerge.servlets;
import java.util.List;
import javax.xml.bind.JAXBElement;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.docx4j.wml.Text;
public class WordDocTest {
/**
* Open word document /Users/Jeff/Development/ReServe-Unleashed/Dev/MailMerge/uploads/Sample.docx, replace a piece of text and save
* the result to /Users/Jeff/Development/ReServe-Unleashed/Dev/MailMerge/uploads/Sample-Out.docx.
*
* The text <<CUS_FNAME>> will be replaced with John.
*
* #param args
*/
public static void main(String[] args) {
// Text nodes begin with w:t in the word document
final String XPATH_TO_SELECT_TEXT_NODES = "//w:t";
try {
// Open the input file
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new java.io.File("/Users/Jeff/Development/ReServe-Unleashed/Dev/MailMerge/uploads/Sample.docx"));
// Build a list of "text" elements
List texts = wordMLPackage.getMainDocumentPart().getJAXBNodesViaXPath(XPATH_TO_SELECT_TEXT_NODES, true);
// Loop through all "text" elements
for (Object obj : texts) {
Text text = (Text) ((JAXBElement) obj).getValue();
// Get the text value
String textValueBefore = text.getValue();
// Perform the replacement
String textValueAfter = textValueBefore.replaceAll("<<CUS_FNAME>>", "John");
// Show the element before and after the replacement
System.out.println("textValueBefore = " + textValueBefore);
System.out.println("textValueAfter = " + textValueAfter);
// Update the text element now that we have performed the replacement
text.setValue(textValueAfter);
}
wordMLPackage.save(new java.io.File("/Users/Jeff/Development/ReServe-Unleashed/Dev/MailMerge/uploads/Sample-Out.docx"));
} catch (Docx4JException e) {
Logger.getLogger(WordDocTest.class.getName()).log(Level.SEVERE, null, e);
e.printStackTrace();
} catch (Exception e) {
Logger.getLogger(WordDocTest.class.getName()).log(Level.SEVERE, null, e);
e.printStackTrace();
}
}
}
On lines 26 and 50 you can see the input/output paths. I've confirmed that the Sample.docx input file does exist and that the uploads directory has write permissions. Can you see anything wrong with my file paths here? I could be completely on the wrong path, but this is all very new to me so I'm learning as I go.
Any and all help is very much appreciated.
At first sight, I would suggest trying with your path written the following way :
wordMLPackage.save(new java.io.File("\\Users\\Jeff\\Development\\ReServe-Unleashed\\Dev\\MailMerge\\uploads\\Sample-Out.docx"));
If it still not works, please provide the stack traces ? It could help. (if no doc is saved, there must be an exception thrown)
Related
I have a requirement where I need to add the time stamp for the screenshot image that is saved in /img folder. When I see AssertionService.java(https://github.com/qmetry/qaf/blob/master/src/com/qmetry/qaf/automation/ui/selenium/AssertionService.java), I See it is adding some random string at the end.
How to remove this random string added and add time stamp? Thanks for the help in advance!
private String captureScreenShot() {
String filename = StringUtil.createRandomString(getTestCaseName()) + ".png";
try {
selenium.captureEntirePageScreenshot(getScreenShotDir() + filename, "");
} catch (Exception e) {
try {
selenium.windowFocus();
} catch (Throwable t) {
logger.error(t);
}
selenium.captureScreenshot(getScreenShotDir() + filename);
}
lastCapturedScreenShot = filename;
logger.info("Captured screen shot: " + lastCapturedScreenShot);
return filename;
}
Are you using selenium 1 or 2 api? Selenium 2 uses following code https://github.com/qmetry/qaf/blob/d58b1d1ca01b2df1a916bcd6d555df4f51a13b12/src/com/qmetry/qaf/automation/core/QAFTestBase.java#L351. Regardless of API, you can't change naming strategy for automatic screenshots. As alternate you may disable auto capturing of screenshot, capture as and when needed and set calling setLastCapturedScreenShot
I want to print a node to a pdf file using "Microsoft Print to PDF" printer. Supposing that the Printer object is already extracted I have the next function which is working perfectly.
public static void printToPDF(Printer printer, Node node) {
PrinterJob job = PrinterJob.createPrinterJob(printer);
if (job != null) {
job.getJobSettings().setPrintQuality(PrintQuality.HIGH);
PageLayout pageLayout = job.getPrinter().createPageLayout(Paper.A4, PageOrientation.PORTRAIT,
Printer.MarginType.HARDWARE_MINIMUM);
boolean printed = job.printPage(pageLayout, node);
if (printed) {
job.endJob();
} else {
System.out.println("Printing failed.");
}
} else {
System.out.println("Could not create a printer job.");
}
}
The only issue that I have here, is that a dialog box is popping up and asking for a destination path to save the pdf. I was struggling to find a solution to set the path programmatically, but with no success. Any suggestions? Thank you in advance.
After some more research I came with an ugly hack. I accessed jobImpl private field from PrinterJob, and I took attributes out of it. Therefore I inserted the destination attribute, and apparently it is working as requested. I know it is not nice, but ... is kind of workable. If you have any nicer suggestion, please do not hesitate to post them.
try {
java.lang.reflect.Field field = job.getClass().getDeclaredField("jobImpl");
field.setAccessible(true);
PrinterJobImpl jobImpl = (PrinterJobImpl) field.get(job);
field.setAccessible(false);
field = jobImpl.getClass().getDeclaredField("printReqAttrSet");
field.setAccessible(true);
PrintRequestAttributeSet printReqAttrSet = (PrintRequestAttributeSet) field.get(jobImpl);
field.setAccessible(false);
printReqAttrSet.add(new Destination(new java.net.URI("file:/C:/deleteMe/wtv.pdf")));
} catch (Exception e) {
System.err.println(e);
}
Please help me understand if my solution is correct.
I'm trying to extract text from a PDF file with a LocationTextExtractionStrategy parser. I'm getting exceptions because the ParseContentMethod tries to parse inline images? The code is simple and looks similar to this:
RenderFilter[] filter = { new RegionTextRenderFilter(cropBox) };
ITextExtractionStrategy strategy = new FilteredTextRenderListener(new LocationTextExtractionStrategy(), filter);
PdfTextExtractor.GetTextFromPage(pdfReader, pageNumber, strategy);
I realize the images are in the content stream but I have a PDF file failing to extract text because of inline images. It returns an UnsupportedPdfException of "The filter /DCTDECODE is not supported" and then it finally fails with and InlineImageParseException of "Could not find image data or EI", when all I really care about is the text. The BI/EI exists in my file so I assume this failure is because of the /DCTDECODE exception. But again, I don't care about images, I'm looking for text.
My current solution for this is to add a filterHandler in the InlineImageUtils class that assigns the Filter_DoNothing() filter to the DCTDECODE filterHandler dictionary. This way I don't get exceptions when I have InlineImages with DCTDECODE. Like this:
private static bool InlineImageStreamBytesAreComplete(byte[] samples, PdfDictionary imageDictionary) {
try {
IDictionary<PdfName, FilterHandlers.IFilterHandler> handlers = new Dictionary<PdfName, FilterHandlers.IFilterHandler>(FilterHandlers.GetDefaultFilterHandlers());
handlers[PdfName.DCTDECODE] = new Filter_DoNothing();
PdfReader.DecodeBytes(samples, imageDictionary, handlers);
return true;
} catch (IOException e) {
return false;
}
}
public class Filter_DoNothing : FilterHandlers.IFilterHandler
{
public byte[] Decode(byte[] b, PdfName filterName, PdfObject decodeParams, PdfDictionary streamDictionary)
{
return b;
}
}
My problem with this "fix" is that I had to change the iTextSharp library. I'd rather not do that so I can try to stay compatible with future versions.
Here's the PDF in question:
https://app.box.com/s/7eaewzu4mnby9ogpl2frzjswgqxn9rz5
I have created a sample program to try to import XFDF to PDF using the Aspose library. The program can be run without exception, but the output PDF does not include any annotations. Any suggestions to solve this problem?
Update - 2014-12-12
I have also sent the issue to Aspose. They can reproduce the same problem and logged a ticket PDFNEWJAVA-34609 in their issue tracking system.
Following is my sample program:
public static void main(String[] args) {
final String ROOT = "C:\\PdfAnnotation\\";
final String sourcePDF = "hackermonthly-issue.pdf";
final String destPDF = "output.pdf";
final String sourceXFDF = "XFDFTest.xfdf";
try
{
// Specify the path of license file
License lic = new License();
lic.setLicense(ROOT + "Aspose.Pdf.lic");
//create an object of PdfAnnotationEditor class
PdfAnnotationEditor editor = new PdfAnnotationEditor();
//bind input PDF file
editor.bindPdf(ROOT + sourcePDF);
//create a file stream for input XFDF file to import annotations
FileInputStream fileStream = new FileInputStream(ROOT + sourceXFDF);
//create an enumeration of all the annotation types which you want to import
//int[] annType = {AnnotationType.Ink };
//import annotations of specified type(s) from XFDF file
//editor.importAnnotationFromXfdf(fileStream, annType);
editor.importAnnotationFromXfdf(fileStream);
//save output pdf file
editor.save(ROOT + destPDF);
} catch (Exception e) {
System.out.println("exception: " + e.getMessage());
}
}
I have searched for possible solution by googling/so/forums for pdfClown/pdfbox and posting the problem at SO.
Problem: I have been trying to find a solution to highlight text, which spans across multiple lines in pdf document. The pdf can have one/two-column pages.
By using pdf-clown, I was able to highlight phrases, ONLY if all the words appear in the same line. pdfBox has created the XML for individual words, I could not find solution for phrases/lines.
Please suggest solution for pdf-clown, if any. (or) any other tool that is capable of highlighting text in multiple lines in pdf, with JAVA compatibility.
I could not understand the answer similar question, but iText, any help?:
Multiline markup annotations with iText
it is possible to get the coordinates of each word in a pdf document using pdfbox, here is the code for it:
import java.io.*;
import org.apache.pdfbox.exceptions.InvalidPasswordException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.common.PDStream;
import org.apache.pdfbox.util.PDFTextStripper;
import org.apache.pdfbox.util.TextPosition;
import java.io.IOException;
import java.util.List;
public class PrintTextLocations extends PDFTextStripper {
public PrintTextLocations() throws IOException {
super.setSortByPosition(true);
}
public static void main(String[] args) throws Exception {
PDDocument document = null;
try {
File input = new File("C:\\path\\to\\PDF.pdf");
document = PDDocument.load(input);
if (document.isEncrypted()) {
try {
document.decrypt("");
} catch (InvalidPasswordException e) {
System.err.println("Error: Document is encrypted with a password.");
System.exit(1);
}
}
PrintTextLocations printer = new PrintTextLocations();
List allPages = document.getDocumentCatalog().getAllPages();
for (int i = 0; i < allPages.size(); i++) {
PDPage page = (PDPage) allPages.get(i);
System.out.println("Processing page: " + i);
PDStream contents = page.getContents();
if (contents != null) {
printer.processStream(page, page.findResources(), page.getContents().getStream());
}
}
} finally {
if (document != null) {
document.close();
}
}
}
protected void processTextPosition(TextPosition text) {
System.out.println("String[" + text.getXDirAdj() + ","
+ text.getYDirAdj() + " fs=" + text.getFontSize() + " xscale="
+ text.getXScale() + " height=" + text.getHeightDir() + " space="
+ text.getWidthOfSpace() + " width="
+ text.getWidthDirAdj() + "]" + text.getCharacter());
}
}
Multi-column text is, at the moment (PDF Clown 0.1.2), not supported for extraction: the current algorithm gathers text laying on the same horizontal baseline without evaluating possible gaps between columns.
Automatic multi-column-layout detection would be possible yet somewhat tricky, as PDF is essentially (you know) an unstructured graphic format. Nonetheless, I'm considering to experiment something about that, in order to deal at least with the most common scenarios.
In the meantime, I can suggest you to try an effective workaround (it implies that you work on a document whose columns are placed in predictable areas): for each column do a separate text extraction, instructing the TextExtractor to look into the corresponding page area, then put all those partial extraction results together and apply your filter.