How can i convert docx to pdf using apache poi and itext 7 with pdf calligraph on in java? - pdf

i want to convert docx to pdf using apache-poi and itext 7(pdf calligraph on)
i have tried using other version of itext but they are showing problem of ligature in indic languages
import org.apache.poi.xwpf.converter.pdf.PdfConverter;
import org.apache.poi.xwpf.converter.pdf.PdfOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.springframework.util.FileCopyUtils;
import java.io.*;
public class Docx2PdfConverterUsingPOI implements Docx2PdfConverter{
public byte[] convert(byte[] docxData) {
byte[] output = null;
try {
InputStream isFromFirstData = new ByteArrayInputStream(docxData);
XWPFDocument document = new XWPFDocument(isFromFirstData);
PdfOptions pdfOptions = PdfOptions.create();
// pdfOptions.fontEncoding(BaseFont.IDENTITY_H);
//make new file in c:\temp\
ByteArrayOutputStream out = new ByteArrayOutputStream();
//Options options =
Options.getTo(ConverterTypeTo.PDF).via(ConverterTypeVia.XWPF).
subOptions(pdfOptions);
PdfConverter.getInstance().convert(document, out, pdfOptions);
document.close();
return out.toByteArray();
} catch (IOException e) {
e.printStackTrace();
}
return output;
}
public static void main(String args[]){
Docx2PdfConverterUsingPOI docx2PdfConverterUsingPOI =new
Docx2PdfConverterUsingPOI();
String inputFile = "D:\\WORKSPACE\\yogesh\\letters\\out.docx";
FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(new File(inputFile));
byte[]output =
docx2PdfConverterUsingPOI.convert(FileCopyUtils.
copyToByteArray(inputStream));
FileCopyUtils.copy(output,new
File("D:\\WORKSPACE\\yogesh\\letters\\out1.pdf"));
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
can anyone help me in how to use itext7 with apache poi for my docx to pdf conversion.
Also,can anyone explain how apache uses itext to get proper result of conversion(so that i can change the itext maven dependency accordingly)

Related

Converting Docx to PDF getting spacing issues

I am trying to convert a document to PDF using apache POI or Docx4j ( Apache FOP) both are giving issues while conversion.
While using apache POI it leaves line spaces when it is not there in document also.
But while using Apache FOP the table heading and table is getting shifted on next page.
I am new to this.
Can you advise which one would be better and how to avoid these spaces.
Here are code samples
SampleCode1 using apache POI
try
{
oLog.error("function POI begin");
String WordDocFile= "C:\\PRPCPersonalEdition\\tomcat\\TestDocument.docx";
oLog.error("WordDocFile is"+WordDocFile);
String pdfDocFile="C:\\PRPCPersonalEdition\\tomcat\\TestDocument.pdf";
oLog.error("pdfDocFile is"+pdfDocFile);
InputStream iStream= new FileInputStream(WordDocFile);
OutputStream os = new FileOutputStream(pdfDocFile);
org.apache.poi.xwpf.usermodel.XWPFDocument document=new org.apache.poi.xwpf.usermodel.XWPFDocument(iStream);
oLog.error("Loaded XWPFDocument");
org.apache.poi.xwpf.converter.pdf.PdfOptions options= org.apache.poi.xwpf.converter.pdf.PdfOptions.create() ;
org.apache.poi.xwpf.converter.pdf.PdfConverter.getInstance().convert(document,os,options);
iStream.close();
os.close();
return "Success";
}
catch(Exception e)
{
e.printStackTrace();
oLog.error("print exception"+e);
return "FAIL";
}
SampleCode2 using Docx4j
try
{ final org.docx4j.wml.ObjectFactory objectFactory = new org.docx4j.wml.ObjectFactory();
String WordDocFile= "C:\\PRPCPersonalEdition\\tomcat\\TestDocument.docx";
String pdfDocFile="C:\\PRPCPersonalEdition\\tomcat\\TestDocument.pdf";
InputStream iStream= new FileInputStream(WordDocFile);
org.docx4j.openpackaging.packages.WordprocessingMLPackage pack = org.docx4j.openpackaging.packages.WordprocessingMLPackage.load(iStream);
org.docx4j.openpackaging.parts.WordprocessingML.MainDocumentPart docPart = pack.getMainDocumentPart();
OutputStream os = new FileOutputStream(pdfDocFile);
org.docx4j.Docx4J.toPDF(pack,os);
iStream.close();
os.close();
return "Success";
}
catch(Exception e)
{
e.printStackTrace();
oLog.error("print exception"+e);
return "FAIL";
}
#Sample Code 3 using itext
try
{
java.util.List lstInputStream = new java.util.ArrayList < java.io.InputStream > ();
oLog.error("function PDF iText begin");
String WordDocFile= "C:\PRPCPersonalEdition\tomcat\DraftMin.docx";
String pdfDocFile="C:\PRPCPersonalEdition\tomcat\DraftMinItext12.pdf";
InputStream iStream= new FileInputStream(WordDocFile);
oLog.error("Read Input file"+iStream);
java.io.InputStream isCurrentStream = null;
OutputStream os = new FileOutputStream(pdfDocFile);
java.io.ByteArrayOutputStream osCurrentStream = null;
org.apache.poi.xwpf.usermodel.XWPFDocument document=new org.apache.poi.xwpf.usermodel.XWPFDocument(iStream);
fr.opensagres.poi.xwpf.converter.pdf.PdfOptions options = fr.opensagres.poi.xwpf.converter.pdf.PdfOptions.create();
fr.opensagres.poi.xwpf.converter.pdf.PdfConverter.getInstance().convert(document,(java.io.OutputStream)osCurrentStream, options);
com.itextpdf.text.Document finalDocument = new com.itextpdf.text.Document();
finalDocument.addAuthor("Your Corporation");
finalDocument.addCreationDate();
finalDocument.addProducer();
finalDocument.addCreator("System");
finalDocument.addTitle("Your Corporation");
finalDocument.setPageSize(com.itextpdf.text.PageSize.A4);
com.itextpdf.text.pdf.PdfCopy copy1 = new com.itextpdf.text.pdf.PdfCopy(finalDocument, (java.io.OutputStream) os);
finalDocument.open();
PRFile file = new PRFile("C:\PRPCPersonalEdition\tomcat\Savepdf.pdf");
PROutputStream prOS = new PROutputStream(file);
prOS.write(osCurrentStream.toByteArray());
oLog.debug("file write");
prOS.close();
PRInputStream prIS = new PRInputStream(file);
oLog.debug("file read");
isCurrentStream = (java.io.InputStream) prIS;
oLog.debug("casting to java.io.InputStream");
file.delete();
lstInputStream.add(isCurrentStream);
java.util.Iterator < java.io.InputStream > pdfIterator = lstInputStream.iterator();
while (pdfIterator.hasNext()) {
java.io.InputStream pdf = pdfIterator.next();
com.itextpdf.text.pdf.PdfReader.unethicalreading = true;
com.itextpdf.text.pdf.PdfReader readingpdf = new com.itextpdf.text.pdf.PdfReader(pdf);
int n = readingpdf.getNumberOfPages();
for ( int i=1;i<n;i++)
{
oLog.error("Entered in loop"+i);
copy1.addPage(copy1.getImportedPage(readingpdf, i));
oLog.error("Read Page n : "+i);
}
}
finalDocument.close();
iStream.close();
os.close();
return "Success";
}
catch(Exception e)
{
e.printStackTrace();
oLog.error("print exception"+e);
return "FAIL";
}

Parsing PDF file using Apache PDFBox to get outlines

Now I can use the PDFBox to extract the outlines from PDF, but some PDF can get the outlines, others can't.
Every PDF has outlines and when I open a pdf use pdf read tool, I can click an outline to a certain page.
Here is my code:
public static void main(String[] args) {
try {
PDDocument document = PDDocument.load(new File(filePath));
PDDocumentOutline outline = document.getDocumentCatalog().getDocumentOutline();
getOutlines(document, outline, "");
document.close();
} catch (InvalidPasswordException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public static void getOutlines(PDDocument document, PDOutlineNode bookmark, String indentation) throws IOException{
PDOutlineItem current = bookmark.getFirstChild();
while (current != null) {
PDPage currentPage = current.findDestinationPage(document);
Integer pageNumber = document.getDocumentCatalog().getPages().indexOf(currentPage) + 1;
System.out.println(current.getTitle() + "-------->" + pageNumber);
getOutlines(document, current, indentation);
current = current.getNextSibling();
}
}

Text getting cut while creating PDF file using Apache PDF box 2.0.6

Creating pdf file by reading a text file
using apache pdfbox 2.0.6. Text which is being read is not getting displayed and is getting cut.
Below is the sample program which I am using:-
public static void main(String[] args) {
// TODO Auto-generated method stub
PDDocument doc = null;
TextToPDF text2pdf = new TextToPDF();
try {
doc = text2pdf.createPDFFromText(new FileReader("C:/sampleTextRead2.txt"));
ByteArrayOutputStream out = new ByteArrayOutputStream();
OutputStreamWriter writer = new OutputStreamWriter(out);
PDFTextStripper stripper = new PDFTextStripper();
stripper.writeText(doc, writer);
writer.close();
doc.save("C:/SamplePDF.pdf");
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}

Webdings font characters not extracted using pdfbox

I am using pdfbox to get the names of all fonts that are used in a pdf.
So far it was working well. However, I recently came across a pdf that has 'Webdings' font. PDFBox was not able to identify it.Could anyone help please.
This is the code I have used:
public static Set<String> extractFonts(String pdfPath) throws IOException
{
PDDocument doc = PDDocument.load(new File(pdfPath));
PDPageTree pages = doc.getDocumentCatalog().getPages();
Set<String> fontSet = new HashSet<String>();
try{
for(PDPage page:pages){
PDResources res = page.getResources();
for (COSName fontName : res.getFontNames())
{
PDFont font = res.getFont(fontName);
if(font != null){
String fontUsedName = font.getName();
if(fontUsedName.contains("+")) {
fontUsedName = fontUsedName.substring(fontUsedName.indexOf("+")+1, fontUsedName.length());
}
fontSet.add(fontUsedName);
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
System.out.println(fontSet);
return fontSet;
}
I was able to know that the font 'Webdings' is present from the File-> Properties->Fonts option in Adobe Reader

How to display a pdf file using PDFBox in JPanel?

I have already created a JForm in netbeans which can read pdf file using PDFBox. But the problem is that I have used the method PDPage.convertToImage() which is really very slow. Can anyone please help me in displaying the pdf using PDFBox in the JPanel at a faster speed ?
The code I have written is inside an ActionListener for a JButton.
File f = null;
ArrayList<JLabel> jl = new ArrayList<JLabel>();
BufferedImage bi = null;
JFileChooser fc = new JFileChooser();
int x=fc.showOpenDialog(null);
if(x==JFileChooser.APPROVE_OPTION)
{
f=fc.getSelectedFile();
}
PDDocument doc=null;
try {
doc = PDDocument.load(f);
} catch (IOException ex) {
JOptionPane.showMessageDialog(null, "not done\n"+ex);
}
List pages = doc.getDocumentCatalog().getAllPages();
Iterator itr = pages.iterator();
int q=0;
while(itr.hasNext())
{
PDPage page = (PDPage)itr.next();
try
{
bi = page.convertToImage();
q++;
jl.add(new JLabel(new ImageIcon(bi)));
}catch(Exception e)
{
JOptionPane.showMessageDialog(null, e);
}
}
itr = jl.iterator();
while(itr.hasNext())
{
viewPanel.setVisible(false);
viewPanel.add((JLabel)itr.next());
viewPanel.setVisible(true);
}
JOptionPane.showMessageDialog(null, "done");
NetBeans has several plugins to display PDFs
http://plugins.netbeans.org/plugin/5809/java-pdf-reader
http://plugins.netbeans.org/plugin/11676/netbeans-pdfviewer
http://plugins.netbeans.org/plugin/17/pdf-viewer-javafx-converter-and-bookmarking-application
HAve you tried any of them?