Adding JPG to PDF extremely slow - pdfbox

I'm trying to write an Image to PDF using PDFBox. I'm using their sample (as attached). Everything is fine, but writing 3.5MB jpeg (3200*2500px) takes roughly 2 seconds.
Is this normal ? Is there any way how to make it faster (at least 10x) ?
public void createPDFFromImage( String inputFile, String image, String outputFile )
throws IOException, COSVisitorException
{
// the document
PDDocument doc = null;
try
{
doc = PDDocument.load( inputFile );
//we will add the image to the first page.
PDPage page = (PDPage)doc.getDocumentCatalog().getAllPages().get( 0 );
PDXObjectImage ximage = null;
if( image.toLowerCase().endsWith( ".jpg" ) )
{
ximage = new PDJpeg(doc, new FileInputStream( image ) );
}
else if (image.toLowerCase().endsWith(".tif") || image.toLowerCase().endsWith(".tiff"))
{
ximage = new PDCcitt(doc, new RandomAccessFile(new File(image),"r"));
}
else
{
//BufferedImage awtImage = ImageIO.read( new File( image ) );
//ximage = new PDPixelMap(doc, awtImage);
throw new IOException( "Image type not supported:" + image );
}
PDPageContentStream contentStream = new PDPageContentStream(doc, page, true, true);
contentStream.drawImage( ximage, 20, 20 );
contentStream.close();
doc.save( outputFile );
}
finally
{
if( doc != null )
{
doc.close();
}
}
}

If you are willing to use another product itext could go really fast, take a look at http://tutorials.jenkov.com/java-itext/image.html .Personally, I did this test with a +750k jpg image and took 78 ms
try {
PdfWriter.getInstance(document,
new FileOutputStream("Image2.pdf"));
document.open();
long start = System.currentTimeMillis();
String imageUrl = "c:/Users/dummy/notSoBigImage.jpg";
Image image = Image.getInstance((imageUrl));
image.setAbsolutePosition(500f, 650f);
document.add(image);
document.close();
long end = System.currentTimeMillis() - start;
System.out.println("time: " + end + " ms");
} catch(Exception e){
e.printStackTrace();
}

Related

iTextSharp Document Isn't Working After Deployment

i'm using iTextSharp to create a pdf document then add it as an attachment to send an email using SendGrid.
The code is working locally but after deploying the project in Azure this function stopped working for some reason. I tried to analyze the problem and i think that the document didn't fully created of attached due to the connection. I can't pin point the exact issue to solve it. Any opinions or discussion is appreciated.
Action:
public async Task<IActionResult> GeneratePDF(int? id, string recipientEmail)
{
//if id valid
if (id == null)
{
return NotFound();
}
var story = await _db.Stories.Include(s => s.Child).Include(s => s.Sentences).ThenInclude(s => s.Image).FirstOrDefaultAsync(s => s.Id == id);
if (story == null)
{
return NotFound();
}
var webRootPath = _hostingEnvironment.WebRootPath;
var path = Path.Combine(webRootPath, "dump"); //folder name
try
{
using (System.IO.MemoryStream memoryStream = new System.IO.MemoryStream())
{
iTextSharp.text.Document document = new iTextSharp.text.Document(iTextSharp.text.PageSize.A4, 10, 10, 10, 10);
PdfWriter writer = PdfWriter.GetInstance(document, memoryStream);
document.Open();
string usedFont = Path.Combine(webRootPath + "\\fonts\\", "arial.TTF");
BaseFont bf = BaseFont.CreateFont(usedFont, BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
iTextSharp.text.Font titleFont = new iTextSharp.text.Font(bf, 40);
iTextSharp.text.Font sentencesFont = new iTextSharp.text.Font(bf, 15);
iTextSharp.text.Font childNamewFont = new iTextSharp.text.Font(bf, 35);
PdfPTable T = new PdfPTable(1);
//Hide the table border
T.DefaultCell.BorderWidth = 0;
T.DefaultCell.HorizontalAlignment = 1;
T.DefaultCell.PaddingTop = 15;
T.DefaultCell.PaddingBottom = 15;
//Set RTL mode
T.RunDirection = PdfWriter.RUN_DIRECTION_RTL;
//Add our text
if (story.Title != null)
{
T.AddCell(new iTextSharp.text.Paragraph(story.Title, titleFont));
}
if (story.Child != null)
{
if (story.Child.FirstName != null && story.Child.LastName != null)
{
T.AddCell(new iTextSharp.text.Phrase(story.Child.FirstName + story.Child.LastName, childNamewFont));
}
}
if (story.Sentences != null)
{
.................
}
document.Add(T);
writer.CloseStream = false;
document.Close();
byte[] bytes = memoryStream.ToArray();
var fileName = path + "\\PDF" + DateTime.Now.ToString("yyyyMMdd-HHMMss") + ".pdf";
using (FileStream fs = new FileStream(fileName, FileMode.Create))
{
fs.Write(bytes, 0, bytes.Length);
}
memoryStream.Position = 0;
memoryStream.Close();
//Send generated pdf as attchment
// Create the file attachment for this email message.
var attachment = Convert.ToBase64String(bytes);
var client = new SendGridClient(Options.SendGridKey);
var msg = new SendGridMessage();
msg.From = new EmailAddress(SD.DefaultEmail, SD.DefaultEmail);
msg.Subject = story.Title;
msg.PlainTextContent = "................";
msg.HtmlContent = "..................";
msg.AddTo(new EmailAddress(recipientEmail));
msg.AddAttachment("Story.pdf", attachment);
try
{
await client.SendEmailAsync(msg);
}
catch (Exception ex)
{
Console.WriteLine("{0} First exception caught.", ex);
}
//Remove form root
if (System.IO.File.Exists(fileName))
{
System.IO.File.Delete(fileName);
}
}
}
catch (FileNotFoundException e)
{
Console.WriteLine($"The file was not found: '{e}'");
}
catch (DirectoryNotFoundException e)
{
Console.WriteLine($"The directory was not found: '{e}'");
}
catch (IOException e)
{
Console.WriteLine($"The file could not be opened: '{e}'");
}
return RedirectToAction("Details", new { id = id });
}
try to edit usedfont variable as bellow :
var usedfont = Path.Combine(webRootPath ,#"\fonts\arial.TTF")
It turns out that the problem is far from iTextSharp. I did a remote debugging from this article.
Two parts of the code was causing the problem.
First, for some reason the folder "dump" was not created on Azure wwwroot folder while locally it is. so, i added these lines:
var webRootPath = _hostingEnvironment.WebRootPath;
var path = Path.Combine(webRootPath, "dump");
if (!Directory.Exists(path)) //Here
Directory.CreateDirectory(path);
Second, after debugging it shows that creating the file was failing every time. I replaced the following lines:
using (FileStream fs = new FileStream(fileName, FileMode.Create))
{
fs.Write(bytes, 0, bytes.Length);
}
memoryStream.Position = 0;
memoryStream.Close();
With:
using (FileStream fs = new FileStream(fileName, FileMode.Create))
using (var binaryWriter = new BinaryWriter(fs))
{
binaryWriter.Write(bytes, 0, bytes.Length);
binaryWriter.Close();
}
memoryStream.Close();
Hope this post helps someone.

Reading attachment from a secured PDF

I am working on a PDF file, which is a secured one and an excel is attached in the PDF file.
The following is the code i tried.
static void Main(string[] args)
{
Program pgm = new Program();
pgm.EmbedAttachments();
//pgm.ExtractAttachments(pgm.pdfFile);
}
private void ExtractAttachments(string _pdfFile)
{
try
{
if (!Directory.Exists(attExtPath))
Directory.CreateDirectory(attExtPath);
byte[] password = System.Text.ASCIIEncoding.ASCII.GetBytes("TFAER13052016");
//byte[] password = System.Text.ASCIIEncoding.ASCII.GetBytes("Password");
PdfDictionary documentNames = null;
PdfDictionary embeddedFiles = null;
PdfDictionary fileArray = null;
PdfDictionary file = null;
PRStream stream = null;
//PdfReader reader = new PdfReader(_pdfFile);
PdfReader reader = new PdfReader(_pdfFile, password);
PdfDictionary catalog = reader.Catalog;
documentNames = (PdfDictionary)PdfReader.GetPdfObject(catalog.Get(PdfName.NAMES));
if (documentNames != null)
{
embeddedFiles = (PdfDictionary)PdfReader.GetPdfObject(documentNames.Get(PdfName.EMBEDDEDFILES));
if (embeddedFiles != null)
{
PdfArray filespecs = embeddedFiles.GetAsArray(PdfName.NAMES);
for (int i = 0; i < filespecs.Size; i++)
{
i++;
fileArray = filespecs.GetAsDict(i);
file = fileArray.GetAsDict(PdfName.EF);
foreach (PdfName key in file.Keys)
{
stream = (PRStream)PdfReader.GetPdfObject(file.GetAsIndirectObject(key));
string attachedFileName = fileArray.GetAsString(key).ToString();
byte[] attachedFileBytes = PdfReader.GetStreamBytes(stream);
System.IO.File.WriteAllBytes(attExtPath + attachedFileName, attachedFileBytes);
}
}
}
else
throw new Exception("Unable to Read the attachment or There may be no Attachment");
}
else
{
throw new Exception("Unable to Read the document");
}
}
catch (Exception ex)
{
Console.WriteLine(ex.ToString());
Console.ReadKey();
}
}
private void EmbedAttachments()
{
try
{
if (File.Exists(pdfFile))
File.Delete(pdfFile);
Document PDFD = new Document(PageSize.LETTER);
PdfWriter writer;
writer = PdfWriter.GetInstance(PDFD, new FileStream(pdfFile, FileMode.Create));
PDFD.Open();
PDFD.NewPage();
PDFD.Add(new Paragraph("This is test"));
PdfFileSpecification pfs = PdfFileSpecification.FileEmbedded(writer, #"C:\PDFReader\1.xls", "11.xls", null);
//PdfFileSpecification pfs = PdfFileSpecification.FileEmbedded(writer, attFile, "11", File.ReadAllBytes(attFile), true);
writer.AddFileAttachment(pfs);
//writer.AddAnnotation(PdfAnnotation.CreateFileAttachment(writer, new iTextSharp.text.Rectangle(100, 100, 100, 100), "File Attachment", PdfFileSpecification.FileExtern(writer, "C:\\test.xml")));
//writer.Close();
PDFD.Close();
Program pgm=new Program();
using (Stream input = new FileStream(pgm.pdfFile, FileMode.Open, FileAccess.Read, FileShare.Read))
{
using (Stream output = new FileStream(pgm.epdfFile, FileMode.Create, FileAccess.Write, FileShare.None))
{
PdfReader reader = new PdfReader(input);
PdfEncryptor.Encrypt(reader, output, true, "Password", "secret", PdfWriter.ALLOW_SCREENREADERS);
}
}
}
catch (Exception ex)
{
Console.WriteLine(ex.StackTrace.ToString());
Console.ReadKey();
}
}
}
The above code contains the creation of a encrypted PDF with an excel attachment and also to extract the same.
Now the real problem is with the file which I already have as a requirement document(I cannot share the file) which also has an excel attachment like my example.
But the above code works for the secured PDF which i have created but not for the actual Secured PDF.
While debugging, I found that the Issue is with the following code
documentNames = (PdfDictionary)PdfReader.GetPdfObject(catalog.Get(PdfName.NAMES));
In which,
catalog.Get(PdfName.NAMES)
is returned as NULL, Where as the File created by me, provides the expected output.
Please guide me on the above.
TIA.
As mkl suggested, It has been attached as an Annotated attachment. But the reference which is used in the example is provided ZipFile Method is no longer supported. Hence I found an alternate code attached below.
public void ExtractAttachments(byte[] src)
{
PRStream stream = null;
string attExtPath = #"C:\PDFReader\Extract\";
if (!Directory.Exists(attExtPath))
Directory.CreateDirectory(attExtPath);
byte[] password = System.Text.ASCIIEncoding.ASCII.GetBytes("TFAER13052016");
PdfReader reader = new PdfReader(src, password);
for (int i = 1; i <= reader.NumberOfPages; i++)
{
PdfArray array = reader.GetPageN(i).GetAsArray(PdfName.ANNOTS);
if (array == null) continue;
for (int j = 0; j < array.Size; j++)
{
PdfDictionary annot = array.GetAsDict(j);
if (PdfName.FILEATTACHMENT.Equals(
annot.GetAsName(PdfName.SUBTYPE)))
{
PdfDictionary fs = annot.GetAsDict(PdfName.FS);
PdfDictionary refs = fs.GetAsDict(PdfName.EF);
foreach (PdfName name in refs.Keys)
{
//zip.AddEntry(
// fs.GetAsString(name).ToString(),
// PdfReader.GetStreamBytes((PRStream)refs.GetAsStream(name))
//);
stream = (PRStream)PdfReader.GetPdfObject(refs.GetAsIndirectObject(name));
string attachedFileName = fs.GetAsString(name).ToString();
var splitname = attachedFileName.Split('\\');
if (splitname.Length != 1)
attachedFileName = splitname[splitname.Length - 1].ToString();
byte[] attachedFileBytes = PdfReader.GetStreamBytes(stream);
System.IO.File.WriteAllBytes(attExtPath + attachedFileName, attachedFileBytes);
}
}
}
}
}
Please Let me Know if it can be achieved in any other way.
Thanks!!!

Merging PDFs using iTextSharp removes Trim Box Detail

I am trying to use iTextSharp to merge 2 or more PDF files. However I am not getting any details about the TrimBox. Performing the code below on the PDF (which was merged) always return NULL
Rectangle rect = reader.GetBoxSize(1, "trim");
This is the code for merging.
public void Merge(List<String> InFiles, String OutFile)
{
using (FileStream stream = new FileStream(OutFile, FileMode.Create))
using (Document doc = new Document())
using (PdfCopy pdf = new PdfCopy(doc, stream))
{
doc.Open();
PdfReader reader = null;
PdfImportedPage page = null;
InFiles.ForEach(file =>
{
reader = new PdfReader(file);
for (int i = 0; i < reader.NumberOfPages; i++)
{
page = pdf.GetImportedPage(reader, i + 1);
pdf.AddPage(page);
}
pdf.FreeReader(reader);
reader.Close();
});
}
}
How to keep I keep the box information after the merge?
-Alan-
Here is the code I created to merge Portrait and Landscape docs using iTextSharp. It works rather well.
public void MergeFiles(System.Collections.Generic.List<string> sourceFiles, string destinationFile)
{
Document document=null;
if (System.IO.File.Exists(destinationFile))
System.IO.File.Delete(destinationFile);
try
{
PdfCopy writer = null;
int numberOfPages=0;
foreach(string sourceFile in sourceFiles)
{
PdfReader reader = new PdfReader(sourceFile);
reader.ConsolidateNamedDestinations();
numberOfPages = reader.NumberOfPages;
if(document==null)
{
document = new Document(reader.GetPageSizeWithRotation(1));
writer = new PdfCopy(document, new FileStream(destinationFile, FileMode.Create));
document.Open();
}
for (int x = 1;x <= numberOfPages;x++ )
{
if (writer != null)
{
PdfImportedPage page = writer.GetImportedPage(reader, x);
writer.AddPage(page);
}
}
PRAcroForm form = reader.AcroForm;
if (form != null && writer != null)
writer.CopyAcroForm(reader);
}
}
finally
{
if (document != null && document.IsOpen())
document.Close();
}
}

Java, edit pdf exist text with PDFBox

i trying edit text with library PDFBox and don't no how. I do not know how to get stream of individual text objects, so I could edit the text and or color.
Any idea, example?
Thanks
I found editing text in pdf is not reliable, so try to clear the text with a rectangle (white/background color fill) and write the new text in the cleared position. Here is a sample code.
//to add a link in footer
//to replace a text
//to replace a link/url/href
public static void editTextorUrl(String inputFile, String outputFile)
throws IOException, COSVisitorException {
// the document
PDDocument doc = null;
try {
System.out.println(inputFile);
doc = PDDocument.load(inputFile);
List pages = doc.getDocumentCatalog().getAllPages();
for (int i = 0; i < pages.size(); i++) {
float inch = 72;
PDGamma colourRed = new PDGamma();
colourRed.setR(1);
PDGamma colourBlue = new PDGamma();
colourBlue.setB(1);
PDGamma white = new PDGamma();
white.setR(1);
white.setB(1);
white.setG(1);
PDBorderStyleDictionary borderThick = new PDBorderStyleDictionary();
borderThick.setWidth(inch / 12); // 12th inch
PDBorderStyleDictionary borderThin = new PDBorderStyleDictionary();
borderThin.setWidth(inch / 72); // 1 point
PDBorderStyleDictionary borderULine = new PDBorderStyleDictionary();
borderULine.setStyle(PDBorderStyleDictionary.STYLE_UNDERLINE);
borderULine.setWidth(inch / 72); // 1 point
PDPage page = (PDPage) pages.get(i);
PDFont font = PDType1Font.HELVETICA;
PDPageContentStream contentStream = new PDPageContentStream(
doc, page, true, false);
contentStream.setNonStrokingColor(Color.WHITE);
contentStream.fillRect(55, 27, 144, 17);
contentStream.setNonStrokingColor(Color.BLUE);
contentStream.beginText();
contentStream.setFont(font, 11);
contentStream.moveTextPositionByAmount(55, 37);
contentStream.drawString("www.loasoftwares.com"); //text to be replaced
contentStream.endText();
contentStream.setLineWidth(inch / 300);
contentStream.setStrokingColor(Color.BLUE);
contentStream.drawLine(55, 34, 188, 34);
contentStream.close();
PDAnnotationLink txtLink = new PDAnnotationLink();
PDRectangle position = new PDRectangle();
position.setLowerLeftX(55);
position.setLowerLeftY(27);
position.setUpperRightX(188);
position.setUpperRightY(50);
txtLink.setRectangle(position);
// add an action
PDActionURI action = new PDActionURI();
action.setURI("www.loasoftwares.com");
txtLink.setBorderStyle(borderULine);
txtLink.setAction(action);
txtLink.setColour(white);
page.getAnnotations().add(txtLink);
}
doc.save(outputFile);
} finally {
if (doc != null) {
doc.close();
}
}
}
Remark:
It works under specific circumstances (ASCII'ish font encodings + pretty long string arguments)
/**
* This is an example that will replace a string in a PDF with a new one.
*
* The example is taken from the pdf file format specification.
*
* #author Ben Litchfield
* #version $Revision: 1.3 $
*/
public class ReplaceString {
/**
* Constructor.
*/
public ReplaceString() {
super();
}
/**
* Locate a string in a PDF and replace it with a new string.
*
* #param inputFile The PDF to open.
* #param outputFile The PDF to write to.
* #param strToFind The string to find in the PDF document.
* #param message The message to write in the file.
*
* #throws IOException If there is an error writing the data.
* #throws COSVisitorException If there is an error writing the PDF.
*/
public void doIt( String inputFile, String outputFile, String strToFind, String message)
throws IOException, COSVisitorException {
// the document
PDDocument doc = null;
try
{
doc = PDDocument.load( inputFile );
List pages = doc.getDocumentCatalog().getAllPages();
for( int i=0; i<pages.size(); i++ )
{
PDPage page = (PDPage)pages.get( i );
PDStream contents = page.getContents();
PDFStreamParser parser = new PDFStreamParser(contents.getStream());
parser.parse();
List tokens = parser.getTokens();
for( int j=0; j<tokens.size(); j++ )
{
Object next = tokens.get( j );
if( next instanceof PDFOperator )
{
PDFOperator op = (PDFOperator)next;
//Tj and TJ are the two operators that display
//strings in a PDF
if( op.getOperation().equals( "Tj" ) )
{
//Tj takes one operator and that is the string
//to display so lets update that operator
COSString previous = (COSString)tokens.get( j-1 );
String string = previous.getString();
string = string.replaceFirst( strToFind, message );
previous.reset();
previous.append( string.getBytes("ISO-8859-1") );
}
else if( op.getOperation().equals( "TJ" ) )
{
COSArray previous = (COSArray)tokens.get( j-1 );
for( int k=0; k<previous.size(); k++ )
{
Object arrElement = previous.getObject( k );
if( arrElement instanceof COSString )
{
COSString cosString = (COSString)arrElement;
String string = cosString.getString();
string = string.replaceFirst( strToFind, message );
cosString.reset();
cosString.append( string.getBytes("ISO-8859-1") );
}
}
}
}
}
//now that the tokens are updated we will replace the
//page content stream.
PDStream updatedStream = new PDStream(doc);
OutputStream out = updatedStream.createOutputStream();
ContentStreamWriter tokenWriter = new ContentStreamWriter(out);
tokenWriter.writeTokens( tokens );
page.setContents( updatedStream );
}
doc.save( outputFile );
}
finally
{
if( doc != null )
{
doc.close();
}
}
}
/**
* This will open a PDF and replace a string if it finds it.
* <br />
* see usage() for commandline
*
* #param args Command line arguments.
*/
public static void main(String[] args)
{
ReplaceString app = new ReplaceString();
try
{
if( args.length != 4 )
{
app.usage();
}
else
{
app.doIt( args[0], args[1], args[2], args[3] );
}
}
catch (Exception e)
{
e.printStackTrace();
}
}
/**
* This will print out a message telling how to use this example.
*/
private void usage()
{
System.err.println( "usage: " + this.getClass().getName() +
" <input-file> <output-file> <search-string> <Message>" );
}
}

How can I convert a PDF into a web-browsable image?

I need to create an online viewer which converts PDF files into browsable images, like http://view.samurajdata.se/. I would like to do this in Grails. Does Grails have any plugins for this?
that's is possible by download PDFRenderer.jar fie and writing code is below
downloadedfile = request.getFile('sourceFile');
println "download file->"+downloadedfile
File destFile=new File('web-app/source-pdf/'+downloadedfile+'.pdf');
if(destFile.exists()){
destFile.delete();
}
File file = null;
try{
file = new File('web-app/source-pdf/'+downloadedfile+'.pdf');
downloadedfile.transferTo(file)
println "file->"+file
}catch(Exception e){
System.err.println("File Already Use")
//out.close();
}
File imageFile=new File("web-app/pdf-images");
if(imageFile.isDirectory())
{
String[] list=imageFile.list()
for(int i=0;i<list.length;i++){
File img=new File("web-app/pdf-images/"+i+".png")
img.delete()
}
}
//response.setContentType("image/png");
// response.setHeader("Cache-control", "no-cache");
RandomAccessFile raf;
BufferedImage[] img;
// response.setContentType("image/png");
// response.setHeader("Cache-control", "no-cache");
file=new File('web-app/source-pdf/'+downloadedfile+'.pdf');
try {
raf = new RandomAccessFile(file, "rws");
FileChannel channel = raf.getChannel();
ByteBuffer buf = channel.map(FileChannel.MapMode.READ_WRITE, 0, channel.size());
PDFFile pdffile = new PDFFile(buf);
// draw the first page to an image
int num=pdffile.getNumPages();
img=new BufferedImage[num]
for(int i=0;i<num;i++)
{
PDFPage page = pdffile.getPage(i);
//get the width and height for the doc at the default zoom
int width=(int)page.getBBox().getWidth();
int height=(int)page.getBBox().getHeight();
Rectangle rect = new Rectangle(0,0,width,height);
int rotation=page.getRotation();
Rectangle rect1=rect;
if(rotation==90 || rotation==270)
rect1=new Rectangle(0,0,(int)rect.height,(int)rect.width);
//generate the image
img[i] = (BufferedImage)page.getImage(
width,height , //width & height
rect1, // clip rect
null, // null for the ImageObserver
true, // fill background with white
true // block until drawing is done
);
ImageIO.write(img[i], "png",new File("web-app/pdf-images/"+i+".png"));
}
// out.close();
}
catch (FileNotFoundException e1) {
System.err.println(e1.getLocalizedMessage());
} catch (IOException e) {
System.err.println(e.getLocalizedMessage());
}
file = null;
render(view:'save',model:[images:img])