PDF lossy compression - pdf

I'm looking for a library or command-line program that can compress PDFs.
Compression speed and file size are very important.
The PDFs are full of very large print-quality images.
Adobe Acrobat does high-quality, fast compression but does not allow "reduced size pdfs" to be saved through a programmatic interface.
Ghostscript does high-quality compression be takes way too long (minutes).

If a commercial library is an option, you could give Amyuni PDF Creator a try. There is .net version (C#/VB.Net etc) and an ActiveX version (for C++/Delphi/VB/PHP etc).
You can iterate through all the objects of each page, pick those who are images, and reduce their size. You have several possibilities there:
Setting a lower compression rate.
Down-sampling (extracting the image, re-sizing it to a lower
resolution, and putting it back in your file)
Combining the previous two.
Here is how the code would look like for the first option, in C#, using Amyuni PDF Creator .Net:
//open a pdf document
document.Open("c:\\temp\\myfile.pdf","");
IacPage page1 = document.GetPage (1);
Amyuni.PDFCreator.IacAttribute attribute = page1.AttributeByName ("Objects");
// listobj is an array list of graphic objects
System.Collections.ArrayList listobj = (System.Collections.ArrayList) attribute.Value;
foreach ( object pdfObj in listobj )
{
if ((IacObjectType)pdfObj.AttributeByName("ObjectType").Value == IacObjectType.acObjectTypePicture)
{
if ((IacImageCompressionConstants)pdfObj.AttributeByName("Compression").Value == IacImageCompressionConstants.acCompressionJPegMedium)
pdfObj.AttributeByName("Compression").Value = IacImageCompressionConstants.acCompressionJPegLow;
if ((IacImageCompressionConstants)pdfObj.AttributeByName("Compression").Value == IacImageCompressionConstants.acCompressionJPegHigh)
pdfObj.AttributeByName("Compression").Value = IacImageCompressionConstants.acCompressionJPegMedium;
// (...)
}
}
usual disclaimer applies

You might want to try Docotic.Pdf library for your task.
Here is a code that scales all images that have width or height greater or equal to 256. Scaled images are then encoded using JPEG compression with quality set to 65.
public static void RecompressToJpeg(string path, string outputPath)
{
using (PdfDocument doc = new PdfDocument(path))
{
foreach (PdfImage image in doc.Images)
{
// image that is used as mask or image with attached mask are
// not good candidates for recompression
if (!image.IsMask && image.Mask == null && (image.Width >= 256 || image.Height >= 256))
image.Scale(0.5, PdfImageCompression.Jpeg, 65);
}
doc.Save(outputPath);
}
}
You could also just recompress images without changing their sizes using one of the RecompressWithJpeg methods (or one of other RecompressXXX methods).
And images can be resized to specified width and height using one of the ResizeTo methods. Please note that you will need to take aspect ratio into account in the latter case.
Disclaimer: I work for the vendor of the library.

Related

ImageMagick.Net - convert pdf to tiff

I am running into an issue when converting from pdf to tiff. Here is the code I used (based on a sample provided in the documentation):
private void convImageMx(string pdfFile)
{
var settings = new MagickReadSettings();
// Settings the density to 300 dpi will create an image with a better quality
settings.Density = new Density(300, 300);
settings.ColorType = ColorType.TrueColor;
string tifpath = Path.GetDirectoryName(pdfFile) + "\\" + Path.GetFileNameWithoutExtension(pdfFile);
using (var images = new MagickImageCollection())
{
// Add all the pages of the pdf file to the collection
images.Read(pdfFile, settings);
var page = 1;
foreach (var image in images)
{
// Write page to file that contains the page number
image.Format = MagickFormat.Ptif;
image.Crop(image.Width, image.Height);
image.Write(tifpath + "_p_" + page + ".tif");
page++;
}
}
}
When I provide a multiple pdf as input, I get multiple tiff files - one file per page. However, each file contains 7 pages which are shrinking images of the original page and the size is very large (original pdf size is 328k, the size of one tiff is 67mb!).
I think I need to set the compression property as well as crop property correctly. But did not find any documentation with .NET.
[EDIT] I commented the line with density so that the size issue is fixed. However, the repeating images is still an issue.

SSRS ReportViewer - How to improve image quality when exporting to PDF

For a while now I have noticed whenever I export reports from the ReportViewer control (Webforms version) to PDF format, any included images lose quality and appear slightly pixelated.
They look just fine in the ReportViewer however.
From what I have read, the PDF renderer will size any included images at 96 dpi, no matter what dpi the image is originally.
I have done some digging and came across this post here
I have tried this approach in my own code behind by wiring up a button like so
protected void btnExport_Click(object sender, EventArgs e)
{
string mimeType, encoding, fileNameExtension;
Warning[] warnings;
string[] streams;
var sb = new StringBuilder(1024);
var xr = XmlWriter.Create(sb);
xr.WriteStartElement("DeviceInfo");
xr.WriteElementString("DpiX", "300");
xr.WriteElementString("DpiY", "300");
xr.Close();
byte[] bytes = ReportViewer1.ServerReport.Render("PDF", sb.ToString(), out mimeType, out encoding,
out fileNameExtension, out streams, out warnings);
Response.ContentType = "application/pdf";
Response.AppendHeader("Content-Disposition", "attachment; filename=Test.pdf");
Response.BinaryWrite(bytes);
}
This actually makes my images appear much smaller in the exported PDF compared to what shows in the Report Viewer control.
My original images are 600x600 at 300dpi, I have tried using these images as they are and on the image properties in the report RDL designer set the sizing property to 'Fit' and sizing the image to 0.25in x 0.25in. Again, all looking great in preview mode in the Report Viewer control but then quality is lost when exporting to PDF.
I tried resizing the images to 0.25in x 0.25in in my image editor (paint.net) leaving at 300 dpi, but still no difference in the results.
I'm just going round in circles now, no doubt I am missing something. I hope there is a way and someone can shed some light for me?
Thanks!

Page by page conversion of PDF into TIFF with proper compression

Problem
There are PDF documents with different type of objects inside. There are simple texts. There can be scanned images that are B&W, and also other images, that are true color. The resolution can be quite high for both (~1789X2711).
I need to convert the PDF into a set of single page TIFF files. There are quite good tools for that. For example Irfanview, ImageMagick. The problem is that I have to define a single compression type for all the pages.
Using JPG for all pages would result in loosing details for B&W images and they would be huge compared to lossless fax compression.
Using lossless fax for all would wanish colors and details of true color images.
Idea
It would be nice to examine the PDF page by page. I could check the content of the page. What kind of images are there inside, and which compression is recommanded for the particular page. I think this can be done with IText, but I don't know exactly, how it should be done. A second thing is that I want to do this analysis without fully reading the PDF file. Is it possible?
Maybe the fastest solution would be to create a list of pages for each compression type with IText analysis, and then to call Irfanview to process the choosen pages with the proper compression.
Any ideas and recommendations are welcome.
UPDATE:
I have now an answer. It does not cover all requirements, and its not freeware. Any opensource ideas? Maybe Java based solutions?
This can be done with DotImage DotPdf from Atalasoft (cue the obligatory "I work there and work on these products"). Here is how I would do this task in C#:
PdfImageSource source = new PdfImageSource(pdfStream);
while (source.HasMoreImages()) {
AtalaImage image = source.AcquireNext();
string fileName = GetNextTiffName();
using (FileStream outStm = new FileStream(fileName, FileMode.Create)) {
TiffEncoder encoder = new TiffEncoder();
encoder.Compression = SelectCompression(image.PixelFormat);
image.Save(outStm, encoder, null);
}
source.Release(image);
}
private TiffCompression SelectCompression(PixelFormat pf)
{
switch (pf) {
// 1 bit? use CCITT G4
case PixelFormat.Pixel1bbIndexed: return TiffCompression.Group4FaxEncoding;
// 24 bit? use JPEG
case PixelFormat.Pixel24bppBgr: return TiffCompression.JpegCompression;
// all else, Lzw
default: return TiffCompression.Lzw;
}
}
You can make SelectCompression do pretty much whatever you want. If you select an invalid compression for that pixel format, the encoder will use an appropriate lossless one in its place (for example, if you select CCITT for 24bit color, the encoder will instead use Lzw).
Our PDF decoder knows when a PDF page is just gray and returns a gray image. It does NOT do anything to get you to 1 bit (this is so antialiased text looks good), however you could threshold the gray image and look at the overall differences between it and the gray image to determine if it could go to 1 bit).
Here's how you could do a set of pages:
public void ExtractNPages(Stream pdfStream, params int[] pageIndexes)
{
PdfImageSource source = new PdfImageSource(pdfStream);
for (int i in pageIndexes) {
AtalaImage image = source[i]; // implied Acquire
string fileName = GetNextTiffName();
using (FileStream outStm = new FileStream(fileName, FileMode.Create)) {
TiffEncoder = new TiffEncoder();
encoder.Compression = SelectCompression(image.PixelFormat);
image.Save(outStm, encoder, null);
}
source.Release(image);
}
}
so now you can just do ExtractNPages(stm, 0, 2, 4, 6);

Mirrored (Flipped) Printing A PDF File

I generating ID Cards of Students of My College in a PDF file using ASP.NET (Framework 3.5) and Crystal Reports But I Want to print the Cards in a Transparent Sheet and Paste it on a Plastic Card of same size for that i need the everything to be printed mirrored.
I tried designing the crystal reports in mirrored form itself but could not find a way to write text in mirrored form. Can anyone suggest a way to do this work all I want is to Flip the contents in PDF File or in Crystal Report.
A couple of Ideas:
1) Render the PDF to an Image (using Ghostscript/ImageMagick or commercial PDF library( (eg Tif) and mirror the image for printing
2) Mirror the PDF Itself, might be possible with iTextSharp
3) Use the reporting tool and try and use some kind of reverse font (coud be quick option)
Any API that lets you import pages and write directly to the PDF content stream will let you do this.
In iText (Java), it'd look something like this:
PdfReader reader = new PdfReader(pdfPath);
Document doc = new Document();
PdfWriter writer = PdfWriter.getInstance( doc, new FileOutputStream(outPath) );
for (int pageNum = 1; pageNum <= reader.getNumberOfPages(); ++pageNum) {
PdfImportedPage page = writer.getImportedPage(reader, pageNum);
PdfContentByte pageContent = writer.getDirectContent();
// flip around vertical axis
pageContent.addTemplate(page, 1f, 0f, 0f, -1f, page.getWidth(), 0f);
doc.newPage();
}
The above code is making the following ass-u-me-ptions:
The default Document() page size matches the size of the current PdfImportedPage.
The source pages aren't rotated.
There are no annotations, optional content groups (layers), and various other bits that aren't just represented in the page contents.
Some workarounds:
// keep the page size consistent
PdfImportedPage page = writer.getImportedPage(reader, pageNum);
doc.newPage(page.getBoundingBox());
PdfContentByte pageContent = writer.getDirectContent();
pageContent.addTemplate(...);
// to compensate for a page's rotation, you need to either rotate the target page
// Easy in PdfStamper, virtually impossible with `Document` / `PdfWriter`.
AffineTransform unRotate = AffineTranform.getRotateInstance(degToRad(360 - pageRotation), pageCenterX, pageCenterY)
AffineTransform flip = new AffineTransform(1f, 0f, 0f, -1f, page.getWidth(), 0f);
AffineTransform finalTrans = flip;
finalTrans.concatenate(unRotate);
pageContent.addTemplate(page, finalTrans);
FAIR WARNING: My 2d matrix-fu isn't all that strong. I'm almost certainly doing something wrong. Debugging these sorts of things is a real PITA. Stuff either "looks right" or is so badly screwed up its off the page entirely (ergo invisible, so you don't know which way it went). I often change the page rectangles by [-1000 -1000 1000 1000] just so I can see where it all went. Fun stuff.
As for copying annotations and such... ouch. PdfCopy does all that for you, via it's addPage() method, but that doesn't let you transform the page content first. Any changes you make to the PdfImportedPage are ignored. You're really stuck with The Hard Way... manually copying all the fiddly bits and changing them to compensate for your flipped page... or messing with the source to addPage() to get the results you want. Both require some in-depth knowledge of PDF.
Given the specifics, you probably don't need to worry about it, but it's worth mentioning in case someone with a different situation comes along with the same goal.

AI (Adobe Illustrator) Files Detect Rasterized

Is there a way to detect whether there are any rasterized components to an Adobe Illustrator file? Under normal circumstances such a file can be vector based (in which case it will scale well when the size is increased) but if there's a pasted image in the file, this of course won't scale. Any ideas? Any programming language implementation is welcome although in the end I would be emitting C#...
Reference Illustrator with COM:
bool HasRaster = false;
Illustrator.Application app = new Illustrator.Application();
Illustrator.Document doc = app.Open("/FileName.AI", null, null);
HasRaster = (doc.RasterItems.Count > 0) ? true : false;
app.Quit();