ImageMagick.Net - convert pdf to tiff - pdf

I am running into an issue when converting from pdf to tiff. Here is the code I used (based on a sample provided in the documentation):
private void convImageMx(string pdfFile)
{
var settings = new MagickReadSettings();
// Settings the density to 300 dpi will create an image with a better quality
settings.Density = new Density(300, 300);
settings.ColorType = ColorType.TrueColor;
string tifpath = Path.GetDirectoryName(pdfFile) + "\\" + Path.GetFileNameWithoutExtension(pdfFile);
using (var images = new MagickImageCollection())
{
// Add all the pages of the pdf file to the collection
images.Read(pdfFile, settings);
var page = 1;
foreach (var image in images)
{
// Write page to file that contains the page number
image.Format = MagickFormat.Ptif;
image.Crop(image.Width, image.Height);
image.Write(tifpath + "_p_" + page + ".tif");
page++;
}
}
}
When I provide a multiple pdf as input, I get multiple tiff files - one file per page. However, each file contains 7 pages which are shrinking images of the original page and the size is very large (original pdf size is 328k, the size of one tiff is 67mb!).
I think I need to set the compression property as well as crop property correctly. But did not find any documentation with .NET.
[EDIT] I commented the line with density so that the size issue is fixed. However, the repeating images is still an issue.

Related

How to convert PDF to image with same resolution as original PDF

I am using Aspose.Pdf to convert PDF to Image and latter convert Image back to Pdf. I would like to keep resolution, width and height of the new Pdf same as original Pdf.
using (var newDocument = new Document())
{
using (var originalDocument = new Document("c:\\source.pdf"))
{
foreach (Aspose.Pdf.Page page in originalDocument.Pages)
{
if (page.Rotate != Aspose.Pdf.Rotation.None)
{
using (var ms = new MemoryStream())
{
// Create Jpegdevice with specified attributes
// Width, Height, Resolution, Quality
// Quality [0-100], 100 is Maximum
// Create Resolution object
Resolution resolution = new Resolution(Convert.ToInt32(page.ArtBox.Width), Convert.ToInt32(page.ArtBox.Height));
JpegDevice device = new JpegDevice(resolution, 100);
// Convert a particular page and save the image to stream
device.Process(page, ms);
var newPage = newDocument.Pages.Add();
// Set margins so image will fit, etc.
newPage.PageInfo.Margin.Bottom = 0;
newPage.PageInfo.Margin.Top = 0;
newPage.PageInfo.Margin.Left = 0;
newPage.PageInfo.Margin.Right = 0;
// Create an image object
Aspose.Pdf.Image image1 = new Aspose.Pdf.Image();
// Set the image file stream
image1.ImageStream = ms;
// Add the image into paragraphs collection of the section
newPage.Paragraphs.Add(image1);
// YES, save after every page otherwise we get out of memory exception
newDocument.Save(txtFolder.Text + "\\out.pdf");
}
}
else
{
newDocument.Pages.Add(page);
newDocument.Save(txtFolder.Text + "\\out.pdf");
}
}
}
}
Qustions
1> The original PDF size is 18MB but the above code creates large file of size 163MB. I would like keep the size as it is or least somewhat near to original file size by keeping the same resolution.
2>Resolution class also takes int valueX and int valueY parameters. And JpegDevice class takes int width and int height parameters. However it does not specify unit, weathers its in points, pixels, inches etc?
As per Aspose.Pdf documentation
The conversion from point to pixel depends on an image's DPI (dots per
inch) property. For example, if an image's DPI is 96 (96 pixels for
each inch), and it is 100 points high, its height in pixels is (100 /
72) * 96 = 133.3. The general formula is: pixels = ( points / 72 ) *
DPI.
However that does not answer the question what unit the API is asking?
Original Issue
We get PDF from our clients. Some of the PDF we receive has rotation of 90 or 270 degree. However when you open them in Adobe or any other PDF viewer, the page's orientation is correct (it is not landscape or horizontal). All this mess i have to do becuase Aspose.Pdf cannot keep the same orientation when change the rotation to None. So suggested approach is convert PDF to Image and latter convert Image back to Pdf
https://forum.aspose.com/t/cannot-set-cropbox-on-rotated-page/201659/3
https://forum.aspose.com/t/remove-rotation-from-a-document-and-set-it-upright/29170

How to add watermark on existing pdf file

I am trying to add watermark on pdf file using PdfSharp, I tried from this link
http://www.pdfsharp.net/wiki/Watermark-sample.ashx
but am not able to get how to get the existing pdf file page object and how to watermark on that page.
Help?
Basically, the samples are only snippets. You can download the source and with that you get a bunch of samples, including this watermark example.
The following comes from PDFSharp-MigraDocFoundation-1_32/PDFsharp/samples/Samples C#/Based on GDI+/Watermark/Program.cs
Quite simple, really ... I am only showing the code up to the for loop that goes over each page. You should have a look at the full file.
[...]
const string watermark = "PDFsharp";
const int emSize = 150;
// Get a fresh copy of the sample PDF file
const string filename = "Portable Document Format.pdf";
File.Copy(Path.Combine("../../../../../PDFs/", filename),
Path.Combine(Directory.GetCurrentDirectory(), filename), true);
// Create the font for drawing the watermark
XFont font = new XFont("Times New Roman", emSize, XFontStyle.BoldItalic);
// Open an existing document for editing and loop through its pages
PdfDocument document = PdfReader.Open(filename);
// Set version to PDF 1.4 (Acrobat 5) because we use transparency.
if (document.Version < 14)
document.Version = 14;
for (int idx = 0; idx < document.Pages.Count; idx++)
{
//if (idx == 1) break;
PdfPage page = document.Pages[idx];
[...]

How to convert existing pdf files to A4 size using pdfbox?

I want to set a size(A4) to an existing document.
I am using pdfbox for watermarking. I used the following link to add watermark. Here I am using another file in which watermark text is there. Latter we are only adding this layer as overlay to original file.
Here the problem arises when file with watermark text is with different size than original document to which the watermark is to be added. In those case the watermark is not getting added properly in terms of position.
Version: I am using pdfbox 1.8. I tried with 2.0 but I am more comfortable with this version.
Here is the code
PDDocument originalPdfFile = PDDocument.load(filename);
PDRectangle pdRect=new PDRectangle(595, 842);//Here I am setting height and width in terms of points
List PageList = originalPdfFile.getDocumentCatalog().getAllPages();
int noOfPages=PageList.size();
System.out.println("No of pages in original document="+noOfPages);
PDPage page=new PDPage();
//PDPage page=new PDPage(PDPage.PAGE_SIZE_A4);
//Here also I tried to add page size
for (int i = 0; i < PageList.size(); i++) {
page=(PDPage)PageList.get(i);
System.out.println("Original Document size in page before cropping: "+(i+1)+", Page Resolution: "+page.getMediaBox());
page.setMediaBox(pdRect);
System.out.println("Original Document size in page after cropping: "+(i+1)+", Page Resolution: "+page.getMediaBox());
//System.out.println("Original Document size in page: "+i+", Height: "+page.getMediaBox().getHeight()+",Width: "+page.getMediaBox().getWidth());
PDRectangle rec=page.getMediaBox();
generateWatermarkText(organisationName,rec);
}
HashMap<Integer, String> overlayGuide = new HashMap<Integer, String>();
for(int i=0; i<originalPdfFile.getNumberOfPages(); i++)
{
overlayGuide.put(i+1, "C:/drm/final/final.pdf");
//watermarktext.pdf is the document which is a one page PDF with your watermark image in it.
}
Overlay overlay = new Overlay();
overlay.setInputPDF(originalPdfFile);
overlay.setOutputFile(filename);
overlay.setOverlayPosition(Overlay.Position.FOREGROUND);
overlay.overlay(overlayGuide,false);
//pdf will have the original PDF with watermarks.
The above code add watermark successfully but I am not able to shrink the page.
This line
PDRectangle pdRect=new PDRectangle(595, 842);
crops the page but it cuts the contains of the page, which I don't want. I want the contains but to should be fit in that page and the page should be of specified size(like A4 in my case).

Cannot stamp certain PDFs

PDF stamping works for nearly every document I have tried. However, a client scanned some pages and his computer generated a PDF document that is resistant to stamping. The embedded image files are in JBIG2 format, but I am not sure if that is important. I have debugged the PDF with Apache's pdfbox, and I can see the text is embedded. It just doesn't show up.
Here is the PDF that won't stamp: http://demo.clearvillageinc.com/plans.pdf
And my code:
static void Main(string[] args) {
string stamp = "<div style=\"color:#F00;\">Reviewed for Code Compliance</div>";
string fileName = #"C:\temp\source.pdf";
string outputFileName = #"C:\temp\source-output.pdf";
// Open a destination stream.
using (var destStream = new System.IO.MemoryStream()) {
using (var sourceReader = new PdfReader(fileName)) {
// Convert the HTML into a stamp.
using (var stampData = FromHtml(stamp)) {
using (var stampReader = new PdfReader(stampData)) {
using (var stamper = new PdfStamper(sourceReader, destStream)) {
stamper.Writer.CloseStream = false;
// Add the stamp stream to the source document.
var stampPage = stamper.GetImportedPage(stampReader, 1);
// Process all of the pages in the source document.
for (int i = 1; i <= sourceReader.NumberOfPages; i++) {
var canvas = stamper.GetOverContent(i);
canvas.AddTemplate(stampPage, 0, -50);
}
}
}
}
}
// Finished. Save the file.
using (var fs = new System.IO.FileStream(outputFileName, FileMode.Create)) {
destStream.Position = 0;
destStream.CopyTo(fs);
}
}
}
public static System.IO.Stream FromHtml(string html) {
var ms = new System.IO.MemoryStream();
// Convert html to pdf.
using (var document = new iTextSharp.text.Document()) {
var writer = iTextSharp.text.pdf.PdfWriter.GetInstance(document, ms);
writer.CloseStream = false;
document.Open();
using (var sr = new System.IO.StringReader(html)) {
XMLWorkerHelper.GetInstance().ParseXHtml(writer, document, sr);
}
}
ms.Position = 0; // Reset for reading.
return ms;
}
One part of a page definition is the "MediaBox" which controls the page's size. This property takes two locations that specify the coordinates of two opposite corners of a rectangle. Although not required, most PDFs specify the lower left corner first followed by the upper right corner. Also, most PDF use 0x0 for the lower left and then whatever the page's width and height for the top corner. So an 8.5x11 inch PDF would be 0,0 and 612,792 (8.5 * 72 = 612 and 11 * 72 = 792) and this would be written as 0,0,612,792.
Your scanned PDF, however, has for whatever reason decided to treat 0,7072 as the lower left corner and 614,7864 as the top right corner. That still gives us (almost) an 8.5x11 page size but if you try to draw something at 0,0 it will be 7,072 pixels below the actual page. You can see this in Acrobat Pro by zooming out very far (1% for me), picking Tools, Edit Object and then doing a Select All. You should see something way far down selected, too.
To get around this, you need to respect the page's boundaries.
for (int i = 1; i <= sourceReader.NumberOfPages; i++) {
//Get the page to be stamped
var pageToBeStamped = sourceReader.GetPageSize(i);
var canvas = stamper.GetOverContent(i);
//Offset our new page by 50 pixels off of the destination page's bottom
canvas.AddTemplate(stampPage, pageToBeStamped.Left, pageToBeStamped.Bottom - 50);
}
The code above gets the rectangle for the imported page and uses bottom offset by 50 pixels (from your original code). Also, although not a problem in your case, we use the imported page's actual left edge instead of just zero.
This code can still break, however. The math in the first paragraph uses 72 which is the default for PDFs but this can be changed. Most people don't change it but most people also don't change 0,0. Currently your -50 assumes the 72 which gives the visual perception of moving the stamp about seven-tenths of an inch from the top edge. If you run into this scenario you'll want to look into retrieving the user unit.
Also, as I said in the first paragraph, most applications use lower left upper right but this isn't a hard rule. Someone could specify upper right and bottom left or even top left and bottom right. This is a hard one to take into account but it is something that you should at least be aware of.

Adding a dynamic image to a PDF using ColdFusion and iText

I pieced together some code to insert a dynamic image into a PDF using both ColdFusion and iText, while filling in some form fields as well. After I got it working and blogged about it, I couldn't help but think that there might be a better way to accomplish this. I'm using the basic idea of this in a production app right now so any comments or suggestion would be most welcomed.
<cfscript>
// full path to PDF you want to add image to
readPDF = expandpath(”your.pdf”);
// full path to the PDF we will output. Using creatUUID() to create
// a unique file name so we can delete it afterwards
writePDF = expandpath(”#createUUID()#.pdf”);
// full path to the image you want to add
yourimage = expandpath(”dynamic_image.jpg”);
// JAVA STUFF!!!
// output buffer to write PDF
fileIO = createObject(”java”,”java.io.FileOutputStream”).init(writePDF);
// reader to read our PDF
reader = createObject(”java”,”com.lowagie.text.pdf.PdfReader”).init(readPDF);
// stamper so we can modify our existing PDF
stamper = createObject(”java”,”com.lowagie.text.pdf.PdfStamper”).init(reader, fileIO);
// get the content of our existing PDF
content = stamper.getOverContent(reader.getNumberOfPages());
// create an image object so we can add our dynamic image to our PDF
image = createobject(”java”, “com.lowagie.text.Image”);
// get the form fields
pdfForm = stamper.getAcroFields();
// setting a value to our form field
pdfForm.setField(”our_field”, “whatever you want to put here”);
// initalize our image
img = image.getInstance(yourimage);
// centering our image top center of our existing PDF with a little margin from the top
x = (reader.getPageSize(1).width() - img.scaledWidth()) - 50;
y = (reader.getPageSize(1).height() - img.scaledHeight()) / 2 ;
// now we assign the position to our image
img.setAbsolutePosition(javacast(”float”, y),javacast(”float”, x));
// add our image to the existing PDF
content.addImage(img);
// flattern our form so our values show
stamper.setFormFlattening(true);
// close the stamper and output our new PDF
stamper.close();
// close the reader
reader.close();
</cfscript>
<!— write out new PDF to the browser —>
<cfcontent type=”application/pdf” file = “#writePDF#” deleteFile = “yes”>
<cfpdf> + DDX seems possible.
See http://forums.adobe.com/thread/332697
I have made it in another way with itext library
I don´t want overwrite my existing pdf with the image to insert, so just modify the original pdf inserting the image, just insert with itext doesn´t work for me.
So, I have to insert the image into a blank pdf (http://itextpdf.com/examples/iia.php?id=59)
And then join my original pdf and the new pdf-image. Obtaining one pdf with several pages.
(http://itextpdf.com/examples/iia.php?id=110)
After that you can overlay the pdf pages with this cool concept
http://itextpdf.com/examples/iia.php?id=113