Issues converting docx to pdf using docx4j

Issues converting docx to pdf using docx4j - docx4j

I am using docx4j 2.8.1 and I tried to convert several different docx file, but i have always the same issue.
maybe the issue is coming from the version of the library or some dependency missing.
Code:
package test;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
import java.io.OutputStream;
import org.docx4j.convert.out.pdf.PdfConversion;
import org.docx4j.convert.out.pdf.viaXSLFO.PdfSettings;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
public class pdfConverter {
public static void main(String[] args) {
createPDF();
}
private static void createPDF() {
try {
// 1) Load DOCX into WordprocessingMLPackage
InputStream is = new FileInputStream(
new File("D:/TestDoc/Res.docx"));
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage
.load(is);
// 2) Prepare Pdf settings
PdfSettings pdfSettings = new PdfSettings();
// 3) Convert WordprocessingMLPackage to Pdf
OutputStream out = new FileOutputStream(new File(
"D:/TestDoc/Res.pdf"));
PdfConversion converter = new org.docx4j.convert.out.pdf.viaXSLFO.Conversion(
wordMLPackage);
converter.output(out, pdfSettings);
} catch (Throwable e) {
e.printStackTrace();
}
}
}
Error:
org.docx4j.openpackaging.exceptions.Docx4JException: FOP issues
at org.docx4j.convert.out.pdf.viaXSLFO.Conversion.output(Conversion.java:374)
at test.pdfConverter.createPDF(pdfConverter.java:42)
at test.pdfConverter.main(pdfConverter.java:21)
Caused by: java.lang.NullPointerException
at org.docx4j.XmlUtils.transform(XmlUtils.java:842)
at org.docx4j.XmlUtils.transform(XmlUtils.java:802)
at org.docx4j.convert.out.pdf.viaXSLFO.Conversion.output(Conversion.java:349)
... 2 more

Solved by changing to jars. i used 2.8.0 and its fine now.

Related

docx to PDF conversion using docx4j

I am trying to convert docx file to pdf using docx4j. But I am getting an error
What I found that docx4j older versions were not able to support shapes inserted in the docs for pdf conversion.
Does any current version of docx4J, documents4j and docx4j-export-fo combination has these supported?
In the input file there were few lines drawn or inserted, as a place holder to populate values, these lines are erroring out while converting to PDF
```
import org.docx4j.Docx4J;
import org.docx4j.Docx4jProperties;
import org.docx4j.openpackaging.packages.WordprocessingMLPackage;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.InputStream;
public class Main {
private static final Logger LOGGER = LoggerFactory.getLogger(Main.class);
public static final int FLAG_EXPORT_PREFER_XSL = 0;
public static void main(String[] args) {
Docx4jProperties.setProperty(
"com.plutext.converter.URL",
"https://converter-eval.plutext.com:443/v1/00000000-0000-0000-0000-000000000000/convert");
try {
InputStream templateInputStream = new FileInputStream("C:\\Documents\\Output_docx\\Directors.docx");
WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(templateInputStream);
String outputfilepath = "C:\\Documents\\Output_docx\\Example_output_2.pdf";
FileOutputStream os = new FileOutputStream(outputfilepath);
Docx4J.toPDF(wordMLPackage,os);
os.flush();
os.close();
} catch (Throwable e) {
LOGGER.warn("Conversion Error!");
e.printStackTrace();
}
}
}
```

See https://www.docx4java.org/blog/2020/09/office-pptxxlsxdocx-to-pdf-to-in-docx4j-8-2-3/ which compares the 3 approaches for generating PDF output.
In summary to handle those objects, use the documents4j approach (which uses Word), or Microsoft Graph.

Query Expansion lucene

I am new to lucene and I am trying to do query expansion.
I have referred to these two posts (first , second) and I've managed to reuse the code in a way that suits version 6.0.0, as the one in the previous is deprecated.
The issue is, either I'm not getting a results or I didn't access the results (expanded queries) appropriately.
Here is my code:
import com.sun.corba.se.impl.util.Version;
import java.io.File;
import java.io.FileInputStream;
import java.io.FileReader;
import java.io.IOException;
import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.io.StringReader;
import java.io.UnsupportedEncodingException;
import org.apache.lucene.analysis.standard.StandardAnalyzer;
import java.net.URLDecoder;
import java.net.URLEncoder;
import java.text.ParseException;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.Tokenizer;
import org.apache.lucene.analysis.core.LowerCaseFilter;
import org.apache.lucene.analysis.standard.ClassicTokenizer;
import org.apache.lucene.analysis.standard.StandardFilter;
import org.apache.lucene.analysis.synonym.SynonymFilter;
import org.apache.lucene.analysis.synonym.SynonymMap;
import org.apache.lucene.analysis.synonym.WordnetSynonymParser;
import org.apache.lucene.analysis.util.CharArraySet;
import org.apache.lucene.util.*;
public class Graph extends Analyzer
{
protected static TokenStreamComponents createComponents(String fieldName, Reader reader) throws ParseException{
System.out.println("1");
// TODO Auto-generated method stub
Tokenizer source = new ClassicTokenizer();
source.setReader(reader);
TokenStream filter = new StandardFilter( source);
filter = new LowerCaseFilter(filter);
SynonymMap mySynonymMap = null;
try {
mySynonymMap = buildSynonym();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
filter = new SynonymFilter(filter, mySynonymMap, false);
return new TokenStreamComponents(source, filter);
}
private static SynonymMap buildSynonym() throws IOException, ParseException
{ System.out.print("build");
File file = new File("wn\\wn_s.pl");
InputStream stream = new FileInputStream(file);
Reader rulesReader = new InputStreamReader(stream);
SynonymMap.Builder parser = null;
parser = new WordnetSynonymParser(true, true, new StandardAnalyzer(CharArraySet.EMPTY_SET));
System.out.print(parser.toString());
((WordnetSynonymParser) parser).parse(rulesReader);
SynonymMap synonymMap = parser.build();
return synonymMap;
}
public static void main (String[] args) throws UnsupportedEncodingException, IOException, ParseException
{
Reader reader = new FileReader("C:\\input.txt"); // here I have the queries that I want to expand
TokenStreamComponents TSC = createComponents( "" , new StringReader("some text goes here"));
**System.out.print(TSC); //How to get the result from TSC????**
}
#Override
protected TokenStreamComponents createComponents(String string)
{
throw new UnsupportedOperationException("Not supported yet."); //To change body of generated methods, choose Tools | Templates.
}
}
Please suggest ways to help me access the expanded queries!

So, are you just trying to figure out how to iterate through the terms from the TokenStreamComponents in your main method?
Something like this:
TokenStreamComponents TSC = createComponents( "" , new StringReader("some text goes here"));
TokenStream stream = TSC.getTokenStream();
CharTermAttribute termattr = stream.addAttribute(CharTermAttribute.class);
stream.reset();
while (stream.incrementToken()) {
System.out.println(termattr.toString());
}

Apache FOP the method newInstance(FopFactoryConfig) in the type FopFactory is not applicable for the arguments ()

I am getting the following error:
Exception in thread "main" java.lang.Error: Unresolved compilation problem: The method newInstance(FopFactoryConfig) in the type FopFactory is not applicable for the arguments ()
at fopdemo.fopvass.PDFHandler.createPDFFile(PDFHandler.java:42)
at fopdemo.fopvass.TestPDF.main(TestPDF.java:40)
This is my code:
package fopdemo.fopvass;
import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.File;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.OutputStream;
import java.net.URL;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBException;
import javax.xml.bind.Marshaller;
import javax.xml.transform.Result;
import javax.xml.transform.Source;
import javax.xml.transform.Transformer;
import javax.xml.transform.TransformerConfigurationException;
import javax.xml.transform.TransformerException;
import javax.xml.transform.TransformerFactory;
import javax.xml.transform.TransformerFactoryConfigurationError;
import javax.xml.transform.dom.DOMResult;
import javax.xml.transform.dom.DOMSource;
import javax.xml.transform.sax.SAXResult;
import javax.xml.transform.stream.StreamSource;
import org.apache.fop.apps.FOPException;
import org.apache.fop.apps.FOUserAgent;
import org.apache.fop.apps.Fop;
import org.apache.fop.apps.FopFactory;
import org.apache.fop.apps.MimeConstants;
public class PDFHandler {
public static final String EXTENSION = ".pdf";
public String PRESCRIPTION_URL = "template.xsl";
public String createPDFFile(ByteArrayOutputStream xmlSource, String templateFilePath) throws IOException {
File file = File.createTempFile("" + System.currentTimeMillis(), EXTENSION);
URL url = new File(templateFilePath + PRESCRIPTION_URL).toURI().toURL();
// creation of transform source
StreamSource transformSource = new StreamSource(url.openStream());
// create an instance of fop factory
FopFactory fopFactory = FopFactory.newInstance();
// a user agent is needed for transformation
FOUserAgent foUserAgent = fopFactory.newFOUserAgent();
// to store output
ByteArrayOutputStream pdfoutStream = new ByteArrayOutputStream();
StreamSource source = new StreamSource(new ByteArrayInputStream(xmlSource.toByteArray()));
Transformer xslfoTransformer;
try {
TransformerFactory transfact = TransformerFactory.newInstance();
xslfoTransformer = transfact.newTransformer(transformSource);
// Construct fop with desired output format
Fop fop;
try {
fop = fopFactory.newFop(MimeConstants.MIME_PDF, foUserAgent, pdfoutStream);
// Resulting SAX events (the generated FO)
// must be piped through to FOP
Result res = new SAXResult(fop.getDefaultHandler());
// Start XSLT transformation and FOP processing
try {
// everything will happen here..
xslfoTransformer.transform(source, res);
// if you want to save PDF file use the following code
OutputStream out = new java.io.FileOutputStream(file);
out = new java.io.BufferedOutputStream(out);
FileOutputStream str = new FileOutputStream(file);
str.write(pdfoutStream.toByteArray());
str.close();
out.close();
} catch (TransformerException e) {
e.printStackTrace();
}
} catch (FOPException e) {
e.printStackTrace();
}
} catch (TransformerConfigurationException e) {
e.printStackTrace();
} catch (TransformerFactoryConfigurationError e) {
e.printStackTrace();
}
return file.getPath();
}
public ByteArrayOutputStream getXMLSource(EmployeeData data) throws Exception {
JAXBContext context;
ByteArrayOutputStream outStream = new ByteArrayOutputStream();
try {
context = JAXBContext.newInstance(EmployeeData.class);
Marshaller m = context.createMarshaller();
m.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, Boolean.TRUE);
m.marshal(data, System.out);
m.marshal(data, outStream);
} catch (JAXBException e) {
e.printStackTrace();
}
return outStream;
}
}
This is where it is called:
package fopdemo.fopvass;
import java.io.ByteArrayOutputStream;
import java.util.ArrayList;
/**
* #author Debasmita.Sahoo
*
*/
public class TestPDF {
public static void main(String args[]){
System.out.println("Hi Testing");
ArrayList employeeList = new ArrayList();
String templateFilePath ="C:/Paula/Proyectos/fop/fopvass/resources/";
Employee e1= new Employee();
e1.setName("Debasmita1 Sahoo");
e1.setEmployeeId("10001");
e1.setAddress("Pune");
employeeList.add(e1);
Employee e2= new Employee();
e2.setName("Debasmita2 Sahoo");
e2.setEmployeeId("10002");
e2.setAddress("Test");
employeeList.add(e2);
Employee e3= new Employee();
e3.setName("Debasmita3 Sahoo");
e3.setEmployeeId("10003");
e3.setAddress("Mumbai");
employeeList.add(e3);
EmployeeData data = new EmployeeData();
data.setEemployeeList(employeeList);
PDFHandler handler = new PDFHandler();
try {
ByteArrayOutputStream streamSource = handler.getXMLSource(data);
handler.createPDFFile(streamSource,templateFilePath);
} catch (Exception e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
}

The following line won't compile:
FopFactory fopFactory = FopFactory.newInstance();
The method newInstance() requires a parameter. From the documentation (provided by Mathias Muller), under 'Basic Usage', that parameter refers to a configuration file:
FopFactory fopFactory = FopFactory.newInstance(
new File( "C:/Temp/fop.xconf" ) );
You need to provide it a File object. Alternatively, it is also possible to create a FopFactory instance by providing a URI to resolve relative URIs in the input file (because SVG files reference other SVG files). As the documentation suggests:
FopFactory fopFactory = FopFactory.newInstance(
new File(".").toURI() );
The user agent is the entity that allows you to interact with a single rendering run, i.e. the processing of a single document. If you wish to customize the user agent's behaviour, the first step is to create your own instance of FOUserAgent using the appropriate factory method on FopFactory and pass that to the factory method that will create a new Fop instance.

FopFactory.newInstance() support
FopFactory.newInstance(File)
FopFactory.newInstance(Uri)
FopFactory.newInstance(Uri, InputStream)
FopFactory.newInstance(FopFactoryConfig)
I successfully implemented on Windows and Mac OS (i.e. a Unix\Linux like NFS) the following code
URI uri = new File(config.getRoot()).toPath().toUri();
FopFactory fopFactory = FopFactory.newInstance(uri);
On the other hand, this does not work (and I don't understand why)
URI uri = new File(config.getRoot()).toUri();

Add a watermark on a pdf that contains images using pdfbox (1.7)

I have used the code suggested in:
PDFBox Overlay fails
to add a watermark to an existing pdf.
Unfortunately, the pdf produced is corrupted. The pdf reader complains when I open the document: "An error exists on this page. Acrobat may not display the page correctly. Please contact the person who created the PDF document to correct the problem".
The document is opened but it does not show the images.
It seems to happen with all the pdfs. It could be worth saying that it happens also with a different implementation that simply uses the Overlay class.
The following url points to a pdf that I used for my testing:
A pdf with an image
The code to test this transformation is:
import java.io.File;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import org.apache.pdfbox.cos.COSDictionary;
import org.apache.pdfbox.cos.COSName;
import org.apache.pdfbox.exceptions.COSVisitorException;
import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.pdmodel.PDPage;
import org.apache.pdfbox.pdmodel.common.PDStream;
import org.apache.pdfbox.pdmodel.edit.PDPageContentStream;
import org.apache.pdfbox.pdmodel.graphics.PDExtendedGraphicsState;
import org.apache.pdfbox.pdmodel.graphics.xobject.PDXObjectForm;
import org.apache.pdfbox.util.MapUtil;
/**
* This test is about overlaying with special effect.
*
* #author mkl
*/
public class OverlayWithEffect
{
final static File RESULT_FOLDER = new File("target/test-outputs", "assembly");
public static void overlayWithDarkenBlendMode(PDDocument document, PDDocument overlay) throws IOException
{
PDXObjectForm xobject = importAsXObject(document, (PDPage) overlay.getDocumentCatalog().getAllPages().get(0));
PDExtendedGraphicsState darken = new PDExtendedGraphicsState();
darken.getCOSDictionary().setName("BM", "Darken");
List<PDPage> pages = document.getDocumentCatalog().getAllPages();
for (PDPage page: pages)
{
if (page.getResources() == null) {
page.setResources(page.findResources());
}
if (page.getResources() != null) {
Map<String, PDExtendedGraphicsState> states = page.getResources().getGraphicsStates();
if (states == null) {
states = new HashMap<String, PDExtendedGraphicsState>();
}
String darkenKey = MapUtil.getNextUniqueKey(states, "Dkn");
states.put(darkenKey, darken);
page.getResources().setGraphicsStates(states);
PDPageContentStream stream = new PDPageContentStream(document, page, true, false, true);
stream.appendRawCommands(String.format("/%s gs ", darkenKey));
stream.drawXObject(xobject, 0, 0, 1, 1);
stream.close();
}
}
}
public static PDXObjectForm importAsXObject(PDDocument target, PDPage page) throws IOException
{
final PDStream xobjectStream = new PDStream(target, page.getContents().createInputStream(), false);
final PDXObjectForm xobject = new PDXObjectForm(xobjectStream);
xobject.setResources(page.findResources());
xobject.setBBox(page.findCropBox());
COSDictionary group = new COSDictionary();
group.setName("S", "Transparency");
group.setBoolean(COSName.getPDFName("K"), true);
xobject.getCOSStream().setItem(COSName.getPDFName("Group"), group);
return xobject;
}
public static void main(String[] args) throws COSVisitorException, IOException
{
InputStream sourceStream = new FileInputStream("x:/pdf-test.pdf");
InputStream overlayStream = new FileInputStream("x:/draft.pdf");
try {
final PDDocument document = PDDocument.load(sourceStream);
final PDDocument overlay = PDDocument.load(overlayStream);
overlayWithDarkenBlendMode(document, overlay);
document.save("x:/da-draft-5.pdf");
document.close();
}
finally {
sourceStream.close();
overlayStream.close();
}
}
}
I am using version 1.7 of pdfbox.
Thanks

As suggested by mkl, it is probably an issue with the version of pdfbox that I am using.

Parsing content from Word document using docx4j

Thanks to a previous answer, I'm now able to read my password-protected Word 2010 documents. (I have to translate them one by one from .doc to .docx. They go back to 1994, but that's okay.)
I wrote a simple Java class to get started:
package model.docx4j;
import model.JournalEntry;
import model.JournalEntryFactory;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.packages.OpcPackage;
import org.docx4j.openpackaging.parts.Parts;
import java.io.IOException;
import java.io.InputStream;
import java.security.GeneralSecurityException;
import java.util.LinkedList;
import java.util.List;
/**
* JournalEntryFactoryImpl using docx4j
* #author Michael
* #link
* #since 9/8/12 12:44 PM
*/
public class JournalEntryFactoryImpl implements JournalEntryFactory {
#Override
public List<JournalEntry> getEntries(InputStream inputStream, String password) throws IOException, GeneralSecurityException {
List<JournalEntry> journalEntries = new LinkedList<JournalEntry>();
if (inputStream != null) {
try {
OpcPackage opcPackage = OpcPackage.load(inputStream, password);
Parts parts = opcPackage.getParts();
} catch (Docx4JException e) {
LOGGER.error("Could not load document into docx4j", e);
throw new IOException(e);
}
}
return journalEntries;
}
}
And a JUnit test to drive it:
package model.docx4j;
import model.JournalEntry;
import model.JournalEntryFactory;
import org.docx4j.openpackaging.exceptions.Docx4JException;
import org.docx4j.openpackaging.packages.OpcPackage;
import org.docx4j.openpackaging.parts.Parts;
import java.io.IOException;
import java.io.InputStream;
import java.security.GeneralSecurityException;
import java.util.LinkedList;
import java.util.List;
/**
* JournalEntryFactoryImpl using docx4j
* #author Michael
* #link
* #since 9/8/12 12:44 PM
*/
public class JournalEntryFactoryImpl implements JournalEntryFactory {
#Override
public List<JournalEntry> getEntries(InputStream inputStream, String password) throws IOException, GeneralSecurityException {
List<JournalEntry> journalEntries = new LinkedList<JournalEntry>();
if (inputStream != null) {
try {
OpcPackage opcPackage = OpcPackage.load(inputStream, password);
Parts parts = opcPackage.getParts();
} catch (Docx4JException e) {
LOGGER.error("Could not load document into docx4j", e);
throw new IOException(e);
}
}
return journalEntries;
}
}
I put a breakpoint into the test to see what docx4j was doing once it read my document. I see a list of 8 parts, but I walked through the tree without finding the content.
Each document consists of a page with a date and content, but I can't find pages. Where do they live?

The main document content lives in the "main document part", which is often named "/word/document.xml".
The usual way to get it with docx4j is:
WordprocessingMLPackage wordMLPackage = (WordprocessingMLPackage)opcPackage;
MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
but you'd expect your approach to work as well.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Issues converting docx to pdf using docx4j - docx4j

Solved by changing to jars. i used 2.8.0 and its fine now.

Related

docx to PDF conversion using docx4j

Query Expansion lucene

Apache FOP the method newInstance(FopFactoryConfig) in the type FopFactory is not applicable for the arguments ()

Add a watermark on a pdf that contains images using pdfbox (1.7)

Parsing content from Word document using docx4j

Categories

Resources