Pdf to image conversion using PdfBox

Pdf to image conversion using PdfBox - pdfbox

When PDF(fillable) is converted to Jpeg using Pdfbox.The tick in checkbox is converted to a box character
WARN [org.apache.pdfbox.rendering.Type1Glyph2D] No glyph for code 52
(a20) in font ZapfDingbats
public static void main(String[] args) throws Exception{
try (final PDDocument document = PDDocument.load(new File("C:\\Users\\priyadarshini.s\\Downloads\\ADWE3244_Merge(1).pdf"))){
ClassLoader classloader = Thread.currentThread().getContextClassLoader();
InputStream is = classloader.getResourceAsStream("zapfdingbats.ttf");
PDFRenderer pdfRenderer = new PDFRenderer(document);
PDFont font = PDType0Font.load(document,is); //PDTrueTypeFont.loadTTF(document, new File( "c:/arial.ttf" ));
//font.s sesetWidths(PDType1Font.HELVETICA.getWidths());
for (int page = 0; page < document.getNumberOfPages(); ++page)
{
BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);
String fileName = OUTPUT_DIR + "image-" + page + ".jpg";
ImageIOUtil.writeImage(bim, fileName, 300);
}
document.close();
} catch (IOException e){
System.err.println("Exception while trying to create pdf document - " + e);
}
}
How do i set the font to the PDF to image code ?

The problem may related with fonts. (Zapf Dingbats and/or MS Gothic may be missing)
Can you try to install missing fonts in a directory "./fonts" or "/usr/share/fonts" for Linux,
"/Windows/Fonts" for Windows

Related

Understanding loading of font in PDFBox 2.0

I have finally succeeded in making PDFBox print my unicodes.
But now, I would like to understand the solution that I have come up with.
The code below works and prints a ≥ to the page.
Two things do not work:
changing
PDType0Font.load(documentMock, systemResourceAsStream, true);
to
PDType0Font.load(documentMock, systemResourceAsStream, false);
changing
final PDFont robotoLight = loadFontAlternative("Roboto-Light.ttf");
to
final PDFont robotoLight = loadFont("Roboto-Light.ttf");
The first change prints two dots instead of the character.
What does embedSubset do, since it does not work when set to false?
The documentation is too sparse for me to understand.
The second change gives the following exception Exception in thread "main" java.lang.IllegalArgumentException: U+2265 is not available in this font's encoding: WinAnsiEncoding
This problem has been covered in many other questions that pre-dates PDFBox 2.0 where there was a bug in handling unicodes.
So, they do not answer the question directly.
That aside, the problem is clear: I should not set the encoding to WinAnsiEncoding but something different.
But what should the encoding be? and why is there no UTF-8 encoding or similar available?
There is no documentation in COSName about the many options.
public class SimpleReportUnicode {
public static void main(String[] args) throws IOException {
PDDocument report = createReport();
final String fileLocation = "c:/SimpleFormUnicode.pdf";
report.save(fileLocation);
report.close();
}
private static PDDocument createReport() throws IOException {
PDDocument document = new PDDocument();
PDPage page = new PDPage();
document.addPage(page);
PDPageContentStream contentStream = new PDPageContentStream(document, page);
final PDFont robotoLight = loadFontAlternative("Roboto-Light.ttf");
writeText(contentStream, robotoLight, 100, 650);
contentStream.close();
return document;
}
private static void writeText(PDPageContentStream contentStream, PDFont font, double x, double y) {
try {
contentStream.beginText();
contentStream.setFont(font, 12);
contentStream.moveTextPositionByAmount((float) x, (float) y);
String unicode = "≥";
contentStream.showText(unicode);
contentStream.endText();
}
catch (IOException e) {
}
}
private static PDFont loadFont(String location) {
PDFont font;
try {
PDDocument documentMock = new PDDocument();
InputStream systemResourceAsStream = ClassLoader.getSystemResourceAsStream(location);
Encoding encoding = Encoding.getInstance(COSName.WIN_ANSI_ENCODING);
font = PDTrueTypeFont.load(documentMock, systemResourceAsStream, encoding);
}
catch (IOException e) {
throw new RuntimeException("IO exception");
}
return font;
}
private static PDFont loadFontAlternative(String location) {
PDDocument documentMock = new PDDocument();
InputStream systemResourceAsStream = ClassLoader.getSystemResourceAsStream(location);
PDFont font;
try {
font = PDType0Font.load(documentMock, systemResourceAsStream, true);
}
catch (IOException e) {
throw new RuntimeException("IO exception");
}
return font;
}
}
EDIT
If you want to use the same font as in the code, Roboto is available here:
https://fonts.google.com/specimen/Roboto
Add Roboto-Light.ttf to your classpath and the code should work out of the box.

As discussed in the comments:
The problem with embedSubsets went away by using version 2.0.7. (Btw 2.0.8 was released today);
The problem "U+2265 is not available in this font's encoding: WinAnsiEncoding" is explained in the FAQ and the solution is to use PDType0Font.load() which you already did in your working version;
There is no UTF-8 encoding for fonts because it isn't available in the PDF specification;
using embedSubsets true produces a 4KB file, with false the file is 100KB because the full font is embedded, so false is usually best.

Webdings font characters not extracted using pdfbox

I am using pdfbox to get the names of all fonts that are used in a pdf.
So far it was working well. However, I recently came across a pdf that has 'Webdings' font. PDFBox was not able to identify it.Could anyone help please.
This is the code I have used:
public static Set<String> extractFonts(String pdfPath) throws IOException
{
PDDocument doc = PDDocument.load(new File(pdfPath));
PDPageTree pages = doc.getDocumentCatalog().getPages();
Set<String> fontSet = new HashSet<String>();
try{
for(PDPage page:pages){
PDResources res = page.getResources();
for (COSName fontName : res.getFontNames())
{
PDFont font = res.getFont(fontName);
if(font != null){
String fontUsedName = font.getName();
if(fontUsedName.contains("+")) {
fontUsedName = fontUsedName.substring(fontUsedName.indexOf("+")+1, fontUsedName.length());
}
fontSet.add(fontUsedName);
}
}
}
} catch (Exception e) {
e.printStackTrace();
}
System.out.println(fontSet);
return fontSet;
}
I was able to know that the font 'Webdings' is present from the File-> Properties->Fonts option in Adobe Reader

Generating PDF with iText and batik

I'm trying to export text and SVG graphs to a PDF. I found out that iText and batik can do this. So I tried doing that, but everytime I put in a graph, it would become extraordinary small.
I thought it might be something with my code, so I figured I would try an examplecode from Vaadin.
public class PdfExportDemo {
private String fontDirectory = null;
private final String baseFont = "Arial";
private PdfWriter writer;
private Document document;
private Font captionFont;
private Font normalFont;
private String svgStr;
/**
* Writes a PDF file with some static example content plus embeds the chart
* SVG.
*
* #param pdffilename
* PDF's filename
* #param svg
* SVG as a String
* #return PDF File
*/
public File writePdf(String pdffilename, String svg) {
svgStr = svg;
document = new Document();
document.addTitle("PDF Sample");
document.addCreator("Vaadin");
initFonts();
File file = null;
try {
file = writeToFile(pdffilename, document);
document.open();
writePdfContent();
document.close();
} catch (DocumentException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return file;
}
/**
* Get Font directory that will be checked for custom fonts.
*
* #return Path to fonts
*/
public String getFontDirectory() {
return fontDirectory;
}
/**
* Set Font directory that will be checked for custom fonts.
*
* #param fontDirectory
* Path to fonts
*/
public void setFontDirectory(String fontDirectory) {
this.fontDirectory = fontDirectory;
}
private void initFonts() {
if (fontDirectory != null) {
FontFactory.registerDirectory(fontDirectory);
}
captionFont = FontFactory.getFont(baseFont, 10, Font.BOLD, new Color(0,
0, 0));
normalFont = FontFactory.getFont(baseFont, 10, Font.NORMAL, new Color(
0, 0, 0));
}
private File writeToFile(String filename, Document document)
throws DocumentException {
File file = null;
try {
file = File.createTempFile(filename, ".pdf");
file.deleteOnExit();
FileOutputStream fileOut = new FileOutputStream(file);
writer = PdfWriter.getInstance(document, fileOut);
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
return file;
}
private void writePdfContent() throws DocumentException, IOException {
Paragraph caption = new Paragraph();
caption.add(new Chunk("Vaadin Charts Export Demo PDF", captionFont));
document.add(caption);
Paragraph br = new Paragraph(Chunk.NEWLINE);
document.add(br);
Paragraph paragraph = new Paragraph();
paragraph.add(new Chunk("This PDF is rendered with iText 2.1.7.",
normalFont));
document.add(paragraph);
paragraph = new Paragraph();
paragraph
.add(new Chunk(
"Chart below is originally an SVG image created with Vaadin Charts and rendered with help of Batik SVG Toolkit.",
normalFont));
document.add(paragraph);
document.add(createSvgImage(writer.getDirectContent(), 400, 400));
document.add(createExampleTable());
}
private PdfPTable createExampleTable() throws BadElementException {
PdfPTable table = new PdfPTable(2);
table.setHeaderRows(1);
table.setWidthPercentage(100);
table.setTotalWidth(100);
// Add headers
table.addCell(createHeaderCell("Browser"));
table.addCell(createHeaderCell("Percentage"));
// Add rows
table.addCell(createCell("Firefox"));
table.addCell(createCell("45.0"));
table.addCell(createCell("IE"));
table.addCell(createCell("26.8"));
table.addCell(createCell("Chrome"));
table.addCell(createCell("12.8"));
table.addCell(createCell("Safari"));
table.addCell(createCell("8.5"));
table.addCell(createCell("Opera"));
table.addCell(createCell("6.2"));
table.addCell(createCell("Others"));
table.addCell(createCell("0.7"));
return table;
}
private PdfPCell createHeaderCell(String caption)
throws BadElementException {
Chunk chunk = new Chunk(caption, captionFont);
Paragraph p = new Paragraph(chunk);
p.add(Chunk.NEWLINE);
p.add(Chunk.NEWLINE);
PdfPCell cell = new PdfPCell(p);
cell.setBorder(0);
cell.setBorderWidthBottom(1);
cell.setHorizontalAlignment(PdfPCell.ALIGN_LEFT);
cell.setVerticalAlignment(PdfPCell.ALIGN_MIDDLE);
return cell;
}
private PdfPCell createCell(String value) throws BadElementException {
PdfPCell cell = new PdfPCell(new Phrase(new Chunk(value, normalFont)));
cell.setBorder(0);
cell.setHorizontalAlignment(PdfPCell.ALIGN_LEFT);
return cell;
}
private Image drawUnscaledSvg(PdfContentByte contentByte)
throws IOException {
// First, lets create a graphics node for the SVG image.
GraphicsNode imageGraphics = buildBatikGraphicsNode(svgStr);
// SVG's width and height
float width = (float) imageGraphics.getBounds().getWidth();
float height = (float) imageGraphics.getBounds().getHeight();
// Create a PDF template for the SVG image
PdfTemplate template = contentByte.createTemplate(width, height);
// Create Graphics2D rendered object from the template
Graphics2D graphics = template.createGraphics(width, height);
try {
// SVGs can have their corner at coordinates other than (0,0).
Rectangle2D bounds = imageGraphics.getBounds();
graphics.translate(-bounds.getX(), -bounds.getY());
// Paint SVG GraphicsNode with the 2d-renderer.
imageGraphics.paint(graphics);
// Create and return a iText Image element that contains the SVG
// image.
return new ImgTemplate(template);
} catch (BadElementException e) {
throw new RuntimeException("Couldn't generate PDF from SVG", e);
} finally {
// Manual cleaning (optional)
graphics.dispose();
}
}
/**
* Use Batik SVG Toolkit to create GraphicsNode for the target SVG.
* <ol>
* <li>Create SVGDocument</li>
* <li>Create BridgeContext</li>
* <li>Build GVT tree. Results to GraphicsNode</li>
* </ol>
*
* #param svg
* SVG as a String
* #return GraphicsNode
* #throws IOException
* Thrown when SVG could not be read properly.
*/
private GraphicsNode buildBatikGraphicsNode(String svg) throws IOException {
UserAgent agent = new UserAgentAdapter();
SVGDocument svgdoc = createSVGDocument(svg, agent);
DocumentLoader loader = new DocumentLoader(agent);
BridgeContext bridgeContext = new BridgeContext(agent, loader);
bridgeContext.setDynamicState(BridgeContext.STATIC);
GVTBuilder builder = new GVTBuilder();
GraphicsNode imageGraphics = builder.build(bridgeContext, svgdoc);
return imageGraphics;
}
private SVGDocument createSVGDocument(String svg, UserAgent agent)
throws IOException {
SVGDocumentFactory documentFactory = new SAXSVGDocumentFactory(
agent.getXMLParserClassName(), true);
SVGDocument svgdoc = documentFactory.createSVGDocument(null,
new StringReader(svg));
return svgdoc;
}
private Image createSvgImage(PdfContentByte contentByte,
float maxPointWidth, float maxPointHeight) throws IOException {
Image image = drawUnscaledSvg(contentByte);
image.scaleToFit(maxPointWidth, maxPointHeight);
return image;
}
}
But when I do this, I still get the small graph. I tried debugging the app, and the size og the graph is actually 10000x600, and then it tries to scale it to fit.
So I tried manually setting the size to like 400x600, no dice. I tried forcing the size on the SVG - no dice. And if I make it, I think, too big then it simply shows a small 1x1cm box with shadows. The output from the example is as follows.
I really hope someone can help.
UPDATE
When I remove these two lines:
Rectangle2D bounds = imageGraphics.getBounds();
graphics.translate(-bounds.getX(), -bounds.getY());
and hardcode the sizes, It kinda works. But the image itself is stil enourmous, and can't seem to fit it.
see for example:

Corrupted pdf after some compression using iTextSharp

The following code creates a .pdf first which is okay and looks perfect, I have taken the rest of the code (which makes the compression) from another post on this site. The problem is that the compressed.pdf file is 1kb and acrobat says the file is damaged and cannot be repaired. I have never made a pdf compressor before. Please, take a look at my code and if it is possible offer some corrections to make it working.
private void btnEndScan_Click(object sender, EventArgs e)
{
Document doc1 = new Document(PageSize.A4, 0, 0, 0, 0);
string filename = "Prot_" + label.Text + ".pdf";
try
{
PdfWriter.GetInstance(doc1, new FileStream("C:/" + filename, FileMode.Create));
doc1.Open();
for (int i = 0; i < imageArray.Length; i++)
{
iTextSharp.text.Image pic = iTextSharp.text.Image.GetInstance(imageArray[i], System.Drawing.Imaging.ImageFormat.Bmp);
pic.ScalePercent(36f);
doc1.Add(pic);
}
}
catch (Exception ex)
{
MessageBox.Show("Error creating pdf file" + ex);
}
finally
{
doc1.Close();
PdfReader reader = new PdfReader("C:/" + filename);
string filepath = "C:/compressed/" + filename;
using (MemoryStream ms = new MemoryStream())
{
PdfStamper stamper = new PdfStamper(reader, ms, PdfWriter.VERSION_1_5);
PdfWriter writer = stamper.Writer;
writer.CompressionLevel = PdfStream.BEST_COMPRESSION;
reader.RemoveFields();
reader.RemoveUnusedObjects();
stamper.Reader.RemoveUnusedObjects();
stamper.SetFullCompression();
stamper.Writer.SetFullCompression();
byte[] compressed = ms.ToArray();
reader.Close();
stamper.Close();
using (FileStream fs = File.Create("C:/compressed/compressed.pdf"))
{
fs.Write(compressed, 0, (int)compressed.Length);
fs.Close();
}
}
}
}

You are cutting the file too short.
Take a look at these lines:
byte[] compressed = ms.ToArray();
reader.Close();
stamper.Close();
They should be ordered like this:
stamper.Close();
reader.Close();
byte[] compressed = ms.ToArray();
The order in your code is wrong because:
You should close the reader after the stamper, because the stamper may need access to the reader while closing.
The file is only complete when you close the stamper. At the moment you create the byte[] not all the PDF data has been written yet. The file is incomplete.
Because of the incomplete byte[], you are removing a substantial part of your file when you do this:
fs.Write(compressed, 0, (int)compressed.Length);
The value of compressed.Length is too short. Your actual file has a larger file size.

Convert PDF files to images with PDFBox

Can someone give me an example on how to use Apache PDFBox to convert a PDF file in different images (one for each page of the PDF)?

Solution for 1.8.* versions:
PDDocument document = PDDocument.loadNonSeq(new File(pdfFilename), null);
List<PDPage> pdPages = document.getDocumentCatalog().getAllPages();
int page = 0;
for (PDPage pdPage : pdPages)
{
++page;
BufferedImage bim = pdPage.convertToImage(BufferedImage.TYPE_INT_RGB, 300);
ImageIOUtil.writeImage(bim, pdfFilename + "-" + page + ".png", 300);
}
document.close();
Don't forget to read the 1.8 dependencies page before doing your build.
Solution for the 2.0 version:
PDDocument document = PDDocument.load(new File(pdfFilename));
PDFRenderer pdfRenderer = new PDFRenderer(document);
for (int page = 0; page < document.getNumberOfPages(); ++page)
{
BufferedImage bim = pdfRenderer.renderImageWithDPI(page, 300, ImageType.RGB);
// suffix in filename will be used as the file format
ImageIOUtil.writeImage(bim, pdfFilename + "-" + (page+1) + ".png", 300);
}
document.close();
The ImageIOUtil class is in a separate download / artifact (pdf-tools). Read the 2.0 dependencies page before doing your build, you'll need extra jar files for PDFs with jbig2 images, for saving to tiff images, and reading of encrypted files.
Make sure to use the latest version of whatever JDK version you are using, i.e. if you are using jdk8, then don't use version 1.8.0_5, use 1.8.0_191 or whatever is the latest at the time you're reading. Early versions were very slow.

I tried it today with PdfBox 2.0.15.
import org.apache.pdfbox.pdmodel.*;
import org.apache.pdfbox.rendering.*;
import java.awt.image.*;
import java.io.*;
import javax.imageio.*;
public static void PDFtoJPG (String in, String out) throws Exception
{
PDDocument pd = PDDocument.load (new File (in));
PDFRenderer pr = new PDFRenderer (pd);
BufferedImage bi = pr.renderImageWithDPI (0, 300);
ImageIO.write (bi, "JPEG", new File (out));
}

public class PDFtoJPGConverter {
public List<File> convertPdfToImage(File file, String destination) throws Exception {
File destinationFile = new File(destination);
if (!destinationFile.exists()) {
destinationFile.mkdir();
System.out.println("DESTINATION FOLDER CREATED -> " + destinationFile.getAbsolutePath());
}else if(destinationFile.exists()){
System.out.println("DESTINATION FOLDER ALLREADY CREATED!!!");
}else{
System.out.println("DESTINATION FOLDER NOT CREATED!!!");
}
if (file.exists()) {
PDDocument doc = PDDocument.load(file);
PDFRenderer renderer = new PDFRenderer(doc);
List<File> fileList = new ArrayList<File>();
String fileName = file.getName().replace(".pdf", "");
System.out.println("CONVERTER START.....");
for (int i = 0; i < doc.getNumberOfPages(); i++) {
// default image files path: original file path
// if necessary, file.getParent() + "/" => another path
File fileTemp = new File(destination + fileName + "_" + i + ".jpg"); // jpg or png
BufferedImage image = renderer.renderImageWithDPI(i, 200);
// 200 is sample dots per inch.
// if necessary, change 200 into another integer.
ImageIO.write(image, "JPEG", fileTemp); // JPEG or PNG
fileList.add(fileTemp);
}
doc.close();
System.out.println("CONVERTER STOPTED.....");
System.out.println("IMAGE SAVED AT -> " + destinationFile.getAbsolutePath());
return fileList;
} else {
System.err.println(file.getName() + " FILE DOES NOT EXIST");
}
return null;
}
public static void main(String[] args) {
try {
PDFtoJPGConverter converter = new PDFtoJPGConverter();
Scanner sc = new Scanner(System.in);
System.out.print("Enter your destination folder where save image \n");
// Destination = D:/PPL/;
String destination = sc.nextLine();
System.out.print("Enter your selected pdf files name with source folder \n");
String sourcePathWithFileName = sc.nextLine();
// Source Path = D:/PDF/ant.pdf,D:/PDF/abc.pdf,D:/PDF/xyz.pdf
if (sourcePathWithFileName != null || sourcePathWithFileName != "") {
String[] files = sourcePathWithFileName.split(",");
for (String file : files) {
File pdf = new File(file);
System.out.print("FILE:>> "+ pdf);
converter.convertPdfToImage(pdf, destination);
}
}
} catch (Exception ex) {
ex.printStackTrace();
}
}
}
====================================
Here i am use Apache pdfbox-2.0.8 , commons-logging-1.2 and fontbox-2.0.8 Library
HAPPY CODING :)

w/o any extra dependencies you can just use the PDFToImage class already included in PDFBox.
Kotlin:
PDFToImage.main(arrayOf<String>("-outputPrefix", "newImgFilenamePrefix", existingPdfFilename))
other config opts: https://pdfbox.apache.org/docs/2.0.8/javadocs/org/apache/pdfbox/tools/PDFToImage.html

import org.apache.pdfbox.pdmodel.PDDocument;
import org.apache.pdfbox.rendering.PDFRenderer;
import javax.imageio.ImageIO;
import java.awt.image.BufferedImage;
import java.io.File;
import java.nio.file.Path;
public class Pdf2Image {
public String convertPdf2Img(String fileInput, Path path) {
String destDir = "";
try {
String destinationDir = path.toString();
File sourceFile = new File(fileInput);
File destinationFile = new File(destinationDir);
if (!destinationFile.exists()) {
destinationFile.mkdir();
System.out.println("Folder Created -> " + destinationFile.getAbsolutePath());
}
if (sourceFile.exists()) {
PDDocument document = PDDocument.load(sourceFile);
PDFRenderer pdfRenderer = new PDFRenderer(document);
String fileName = sourceFile.getName().replace(".pdf", "");
// int pageNumber = 0;
// for (PDPage page : document.getPages()) {
for (int pageNumber = 0; pageNumber < document.getNumberOfPages(); ++pageNumber) {
BufferedImage bim = pdfRenderer.renderImage(pageNumber);
destDir = destinationDir + File.separator + fileName + "_" + pageNumber + ".png";
ImageIO.write(bim, "png", new File(destDir));
}
document.close();
System.out.println("Image saved at -> " + destinationFile.getAbsolutePath());
} else {
System.err.println(sourceFile.getName() + " File does not exist");
}
} catch (Exception e) {
e.printStackTrace();
}
return destDir;
}
}

Here is part of my code to convert a pdf, from a multipart file, to jpg thumbnail. I'm saving the image as a base64 string. Pdfbox 2.0.21 version was used.
private static String generatePdfThumbnail(byte[] imageInBytesArray) throws IOException {
PDDocument document = PDDocument.load(imageInBytesArray);
PDFRenderer renderer = new PDFRenderer(document);
BufferedImage bufferedImage = renderer.renderImage(0);
Graphics2D bufImageGraphics = bufferedImage.createGraphics();
bufImageGraphics.drawImage(bufferedImage, 0, 0, null);
ByteArrayOutputStream baos = new ByteArrayOutputStream();
boolean foundWriter = ImageIO.write(bufferedImage, "jpg", baos);
byte[] fileContent = null;
if (!foundWriter) {
return "";
}
fileContent = baos.toByteArray();
return Base64.getEncoder().encodeToString(fileContent);
}

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Pdf to image conversion using PdfBox - pdfbox

The problem may related with fonts. (Zapf Dingbats and/or MS Gothic may be missing) Can you try to install missing fonts in a directory "./fonts" or "/usr/share/fonts" for Linux, "/Windows/Fonts" for Windows

Related

Understanding loading of font in PDFBox 2.0

Webdings font characters not extracted using pdfbox

Generating PDF with iText and batik

Corrupted pdf after some compression using iTextSharp

Convert PDF files to images with PDFBox

Categories

Resources