Text getting cut while creating PDF file using Apache PDF box 2.0.6 - apache

Creating pdf file by reading a text file
using apache pdfbox 2.0.6. Text which is being read is not getting displayed and is getting cut.
Below is the sample program which I am using:-
public static void main(String[] args) {
// TODO Auto-generated method stub
PDDocument doc = null;
TextToPDF text2pdf = new TextToPDF();
try {
doc = text2pdf.createPDFFromText(new FileReader("C:/sampleTextRead2.txt"));
ByteArrayOutputStream out = new ByteArrayOutputStream();
OutputStreamWriter writer = new OutputStreamWriter(out);
PDFTextStripper stripper = new PDFTextStripper();
stripper.writeText(doc, writer);
writer.close();
doc.save("C:/SamplePDF.pdf");
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}

Related

How can i convert docx to pdf using apache poi and itext 7 with pdf calligraph on in java?

i want to convert docx to pdf using apache-poi and itext 7(pdf calligraph on)
i have tried using other version of itext but they are showing problem of ligature in indic languages
import org.apache.poi.xwpf.converter.pdf.PdfConverter;
import org.apache.poi.xwpf.converter.pdf.PdfOptions;
import org.apache.poi.xwpf.usermodel.XWPFDocument;
import org.springframework.util.FileCopyUtils;
import java.io.*;
public class Docx2PdfConverterUsingPOI implements Docx2PdfConverter{
public byte[] convert(byte[] docxData) {
byte[] output = null;
try {
InputStream isFromFirstData = new ByteArrayInputStream(docxData);
XWPFDocument document = new XWPFDocument(isFromFirstData);
PdfOptions pdfOptions = PdfOptions.create();
// pdfOptions.fontEncoding(BaseFont.IDENTITY_H);
//make new file in c:\temp\
ByteArrayOutputStream out = new ByteArrayOutputStream();
//Options options =
Options.getTo(ConverterTypeTo.PDF).via(ConverterTypeVia.XWPF).
subOptions(pdfOptions);
PdfConverter.getInstance().convert(document, out, pdfOptions);
document.close();
return out.toByteArray();
} catch (IOException e) {
e.printStackTrace();
}
return output;
}
public static void main(String args[]){
Docx2PdfConverterUsingPOI docx2PdfConverterUsingPOI =new
Docx2PdfConverterUsingPOI();
String inputFile = "D:\\WORKSPACE\\yogesh\\letters\\out.docx";
FileInputStream inputStream = null;
try {
inputStream = new FileInputStream(new File(inputFile));
byte[]output =
docx2PdfConverterUsingPOI.convert(FileCopyUtils.
copyToByteArray(inputStream));
FileCopyUtils.copy(output,new
File("D:\\WORKSPACE\\yogesh\\letters\\out1.pdf"));
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
}
}
can anyone help me in how to use itext7 with apache poi for my docx to pdf conversion.
Also,can anyone explain how apache uses itext to get proper result of conversion(so that i can change the itext maven dependency accordingly)

Adding ColorSpace to resources causes the stream to close

I am trying very simple steps to add colorspace to resources using PDFBOX version 2.0.7, but it is not working.
I have PDF "pdf1.pdf", I am reading the colorspaces from this file and adding them to HashMap, then I am creating new resources and trying to add the colorspaces to the newly created resources. But it is not working
So the first Step, I read the colorSpaces from the sourcePdf file and add them to HashMap:
seperationColors = new HashMap<COSName, PDColorSpace>();
PDDocument sourcePdfFile = null;
try {
sourcePdfFile = PDDocument.load(new FileInputStream(new File(pdfPath)));
PDPage page = sourcePdfFile.getPages().get(0);
page.getContents();
for (COSName name : page.getResources().getColorSpaceNames()) {
PDColor color = page.getResources().getColorSpace(name).getInitialColor();
if (color.getColorSpace() instanceof PDSeparation) {
seperationColors.put(name, page.getResources().getColorSpace(name));
}
}
} catch (FileNotFoundException e) {
// e.printStackTrace();
} catch (IOException e) {
// e.printStackTrace();
} finally {
if (sourcePdfFile != null)
try {
sourcePdfFile.close();
} catch (IOException e) {
e.printStackTrace();
} finally {
sourcePdfFile = null;
}
}
}
Then, at later stages in the code, I want to create new PDF document, and add the colorSpaces from the source Pdf to the new one.
PDResources newResources = new PDResources();
PDColorSpace colorSpace = originalDocumentColorSpaces.values().iterator().next();
newResources.add(colorSpace);
newResources will have the error: COSDictionary{COSStream has been closed and cannot be read. Perhaps its enclosing PDDocument has been closed?}
after the add operation (line 3)
colorSpace is of type PDSeperation.
Any clue?

Parsing PDF file using Apache PDFBox to get outlines

Now I can use the PDFBox to extract the outlines from PDF, but some PDF can get the outlines, others can't.
Every PDF has outlines and when I open a pdf use pdf read tool, I can click an outline to a certain page.
Here is my code:
public static void main(String[] args) {
try {
PDDocument document = PDDocument.load(new File(filePath));
PDDocumentOutline outline = document.getDocumentCatalog().getDocumentOutline();
getOutlines(document, outline, "");
document.close();
} catch (InvalidPasswordException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}
public static void getOutlines(PDDocument document, PDOutlineNode bookmark, String indentation) throws IOException{
PDOutlineItem current = bookmark.getFirstChild();
while (current != null) {
PDPage currentPage = current.findDestinationPage(document);
Integer pageNumber = document.getDocumentCatalog().getPages().indexOf(currentPage) + 1;
System.out.println(current.getTitle() + "-------->" + pageNumber);
getOutlines(document, current, indentation);
current = current.getNextSibling();
}
}

ANTLR test class not compiling?

I've cobbled together some code to test a lexer/parser grammar but I'm a stuck on how to create the appropriate file input / stream objects to parse a file. My code is as follows, and I'm getting an error about giving the BasicLexer class constructor an ANTLRInputStream instead of a CharStream, and a similar message with giving the BasicParser a CommonTokenStream (it expects TokenStream). Any ideas on where I've gone wrong?
public static void main(String[] args) throws Exception {
String filename = args[0];
InputStream is;
try {
is = new FileInputStream(filename);
//is.close();
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
ANTLRInputStream in = new ANTLRInputStream(is);
BasicLexer lexer = new BasicLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
BasicParser parser = new BasicParser(tokens);
parser.eval();
}

Adding new revision for document in DropBox through android api

I want to add a new revision to the document(Test.doc) in Dropbox using android api. Can anyone share me any sample code or links. I tried
FileInputStream inputStream = null;
try {
DropboxInputStream temp = mDBApi.getFileStream("/Test.doc", null);
String revision = temp.getFileInfo().getMetadata().rev;
Log.d("REVISION : ",revision);
File file = new File("/sdcard0/renamed.doc");
inputStream = new FileInputStream(file);
Entry newEntry = mDBApi.putFile("/Test.doc", inputStream, file.length(), revision, new ProgressListener() {
#Override
public void onProgress(long arg0, long arg1) {
Log.d("","Uploading.. "+arg0+", Total : "+arg1);
}
});
} catch (Exception e) {
System.out.println("Something went wrong: " + e);
} finally {
if (inputStream != null) {
try {
inputStream.close();
} catch (IOException e) {}
}
}
New revision is created for first time. When i execute again, another new revision is not getting created.