PDF to Black and White TIFF conversion using PDFBox looses quality

PDF to Black and White TIFF conversion using PDFBox looses quality - pdfbox

I have written a program that reads pages of a pdf file using PDFBox api and sends the BufferedImage to the following method that converts it to black and white. Then my programs writes it to TIFF files using FilesUtils.
private BufferedImage toBlacknWhite(BufferedImage imageBuffer) {
BufferedImage bw = new BufferedImage(imageBuffer.getWidth(),
imageBuffer.getHeight(), BufferedImage.TYPE_BYTE_BINARY);
if (imageBuffer != null) {
Graphics2D g2d = bw.createGraphics();
g2d.drawImage(imageBuffer, 0, 0, null);
g2d.dispose();
}
return bw;
}
The problem I am having is that the output TIFF files are loosing major portions of image and are of very poor quality. Please suggest me a way to improve the quality of the output image.
Original Image:
Output Image:
Thank you.

Related

ImageMagick.Net - convert pdf to tiff

I am running into an issue when converting from pdf to tiff. Here is the code I used (based on a sample provided in the documentation):
private void convImageMx(string pdfFile)
{
var settings = new MagickReadSettings();
// Settings the density to 300 dpi will create an image with a better quality
settings.Density = new Density(300, 300);
settings.ColorType = ColorType.TrueColor;
string tifpath = Path.GetDirectoryName(pdfFile) + "\\" + Path.GetFileNameWithoutExtension(pdfFile);
using (var images = new MagickImageCollection())
{
// Add all the pages of the pdf file to the collection
images.Read(pdfFile, settings);
var page = 1;
foreach (var image in images)
{
// Write page to file that contains the page number
image.Format = MagickFormat.Ptif;
image.Crop(image.Width, image.Height);
image.Write(tifpath + "_p_" + page + ".tif");
page++;
}
}
}
When I provide a multiple pdf as input, I get multiple tiff files - one file per page. However, each file contains 7 pages which are shrinking images of the original page and the size is very large (original pdf size is 328k, the size of one tiff is 67mb!).
I think I need to set the compression property as well as crop property correctly. But did not find any documentation with .NET.
[EDIT] I commented the line with density so that the size issue is fixed. However, the repeating images is still an issue.

How can I draw a text into a buffered image with different font affects in Java 2d?

I am trying to convert Multi-line Text to image but couldn't found a way to draw it with different Font format's is there any way to do it.Thanks in Advance.

What you could do is simply loading an image, and then setting it as the default graphics context, and then drawing text with a Graphics2D object using different fonts. Here's how you could do this :
BufferedImage image = ImageIO.load(new File("test.png"));
Graphics2D g2d = image.createGraphics();
g2d.setFont(new Font("TimesRoman", Font.PLAIN, fontSize));
g2d.drawString("test", posx, posy, etc.)
If you want simply to draw to a blank image and then save it, just create a BufferedImage using the default constructor. Then, to save the image :
File output = new File("test.png");
ImageIO.write(image, "png", output);
If you want to know more about this, here's a link to Oracle's Java2D tutorials : https://docs.oracle.com/javase/tutorial/2d/images/drawonimage.html
(from where these examples are).

SSRS ReportViewer - How to improve image quality when exporting to PDF

For a while now I have noticed whenever I export reports from the ReportViewer control (Webforms version) to PDF format, any included images lose quality and appear slightly pixelated.
They look just fine in the ReportViewer however.
From what I have read, the PDF renderer will size any included images at 96 dpi, no matter what dpi the image is originally.
I have done some digging and came across this post here
I have tried this approach in my own code behind by wiring up a button like so
protected void btnExport_Click(object sender, EventArgs e)
{
string mimeType, encoding, fileNameExtension;
Warning[] warnings;
string[] streams;
var sb = new StringBuilder(1024);
var xr = XmlWriter.Create(sb);
xr.WriteStartElement("DeviceInfo");
xr.WriteElementString("DpiX", "300");
xr.WriteElementString("DpiY", "300");
xr.Close();
byte[] bytes = ReportViewer1.ServerReport.Render("PDF", sb.ToString(), out mimeType, out encoding,
out fileNameExtension, out streams, out warnings);
Response.ContentType = "application/pdf";
Response.AppendHeader("Content-Disposition", "attachment; filename=Test.pdf");
Response.BinaryWrite(bytes);
}
This actually makes my images appear much smaller in the exported PDF compared to what shows in the Report Viewer control.
My original images are 600x600 at 300dpi, I have tried using these images as they are and on the image properties in the report RDL designer set the sizing property to 'Fit' and sizing the image to 0.25in x 0.25in. Again, all looking great in preview mode in the Report Viewer control but then quality is lost when exporting to PDF.
I tried resizing the images to 0.25in x 0.25in in my image editor (paint.net) leaving at 300 dpi, but still no difference in the results.
I'm just going round in circles now, no doubt I am missing something. I hope there is a way and someone can shed some light for me?
Thanks!

Page by page conversion of PDF into TIFF with proper compression

Problem
There are PDF documents with different type of objects inside. There are simple texts. There can be scanned images that are B&W, and also other images, that are true color. The resolution can be quite high for both (~1789X2711).
I need to convert the PDF into a set of single page TIFF files. There are quite good tools for that. For example Irfanview, ImageMagick. The problem is that I have to define a single compression type for all the pages.
Using JPG for all pages would result in loosing details for B&W images and they would be huge compared to lossless fax compression.
Using lossless fax for all would wanish colors and details of true color images.
Idea
It would be nice to examine the PDF page by page. I could check the content of the page. What kind of images are there inside, and which compression is recommanded for the particular page. I think this can be done with IText, but I don't know exactly, how it should be done. A second thing is that I want to do this analysis without fully reading the PDF file. Is it possible?
Maybe the fastest solution would be to create a list of pages for each compression type with IText analysis, and then to call Irfanview to process the choosen pages with the proper compression.
Any ideas and recommendations are welcome.
UPDATE:
I have now an answer. It does not cover all requirements, and its not freeware. Any opensource ideas? Maybe Java based solutions?

This can be done with DotImage DotPdf from Atalasoft (cue the obligatory "I work there and work on these products"). Here is how I would do this task in C#:
PdfImageSource source = new PdfImageSource(pdfStream);
while (source.HasMoreImages()) {
AtalaImage image = source.AcquireNext();
string fileName = GetNextTiffName();
using (FileStream outStm = new FileStream(fileName, FileMode.Create)) {
TiffEncoder encoder = new TiffEncoder();
encoder.Compression = SelectCompression(image.PixelFormat);
image.Save(outStm, encoder, null);
}
source.Release(image);
}
private TiffCompression SelectCompression(PixelFormat pf)
{
switch (pf) {
// 1 bit? use CCITT G4
case PixelFormat.Pixel1bbIndexed: return TiffCompression.Group4FaxEncoding;
// 24 bit? use JPEG
case PixelFormat.Pixel24bppBgr: return TiffCompression.JpegCompression;
// all else, Lzw
default: return TiffCompression.Lzw;
}
}
You can make SelectCompression do pretty much whatever you want. If you select an invalid compression for that pixel format, the encoder will use an appropriate lossless one in its place (for example, if you select CCITT for 24bit color, the encoder will instead use Lzw).
Our PDF decoder knows when a PDF page is just gray and returns a gray image. It does NOT do anything to get you to 1 bit (this is so antialiased text looks good), however you could threshold the gray image and look at the overall differences between it and the gray image to determine if it could go to 1 bit).
Here's how you could do a set of pages:
public void ExtractNPages(Stream pdfStream, params int[] pageIndexes)
{
PdfImageSource source = new PdfImageSource(pdfStream);
for (int i in pageIndexes) {
AtalaImage image = source[i]; // implied Acquire
string fileName = GetNextTiffName();
using (FileStream outStm = new FileStream(fileName, FileMode.Create)) {
TiffEncoder = new TiffEncoder();
encoder.Compression = SelectCompression(image.PixelFormat);
image.Save(outStm, encoder, null);
}
source.Release(image);
}
}
so now you can just do ExtractNPages(stm, 0, 2, 4, 6);

Some pdf file watermark does not show using iText

Our company using iText to stamp some watermark text (not image) on some pdf forms. I noticed 95% forms shows watermark correctly, about 5% does not. I tested, copy 2 original pdf files, one was marked ok, other one does not ok, then tested in via a small program, same result: one got marked, the other does not. I then tried the latest version of iText jar file (version 5.0.6), same thing. I checked pdf file properties, security settings etc, seems nothing shows any hint. The result file does changed size and markd "changed by iText version...." after executed program.
Here is the sample watermark code (using itext jar version 2.1.7), note topText, mainText, bottonText parameters passed in, make 3 lines of watermarks show in the pdf as watermark.
Any help appreciated !!
public class WatermarkGenerator {
private static int TEXT_TILT_ANGLE = 25;
private static Color MEDIUM_GRAY = new Color(160, 160, 160);
private static int SUPPORT_FONT_SIZE = 42;
private static int PRIMARY_FONT_SIZE = 54;
public static void addWaterMark(InputStream pdfInputStream,
OutputStream outputStream, String topText,
String mainText, String bottomText) throws Exception {
PdfReader reader = new PdfReader(pdfInputStream);
int numPages = reader.getNumberOfPages();
// Create a stamper that will copy the document to the output
// stream.
PdfStamper stamp = new PdfStamper(reader, outputStream);
int page=1;
BaseFont baseFont =
BaseFont.createFont(BaseFont.HELVETICA_BOLDOBLIQUE,
BaseFont.WINANSI, BaseFont.EMBEDDED);
float width;
float height;
while (page <= numPages) {
PdfContentByte cb = stamp.getOverContent(page);
height = reader.getPageSizeWithRotation(page).getHeight() / 2;
width = reader.getPageSizeWithRotation(page).getWidth() / 2;
cb = stamp.getUnderContent(page);
cb.saveState();
cb.setColorFill(MEDIUM_GRAY);
// Top Text
cb.beginText();
cb.setFontAndSize(baseFont, SUPPORT_FONT_SIZE);
cb.showTextAligned(Element.ALIGN_CENTER, topText, width,
height+PRIMARY_FONT_SIZE+16, TEXT_TILT_ANGLE);
cb.endText();
// Primary Text
cb.beginText();
cb.setFontAndSize(baseFont, PRIMARY_FONT_SIZE);
cb.showTextAligned(Element.ALIGN_CENTER, mainText, width,
height, TEXT_TILT_ANGLE);
cb.endText();
// Bottom Text
cb.beginText();
cb.setFontAndSize(baseFont, SUPPORT_FONT_SIZE);
cb.showTextAligned(Element.ALIGN_CENTER, bottomText, width,
height-PRIMARY_FONT_SIZE-6, TEXT_TILT_ANGLE);
cb.endText();
cb.restoreState();
page++;
}
stamp.close();
}
}

We solved problem by change Adobe LifecycleSave file option. File->Save->properties->Save as, then look at Save as type, default is Acrobat 7.0.5 Dynamic PDF Form File, we changed to use 7.0.5 Static PDF Form File (actually any static one will work). File saved in static one do not have this watermark disappear problem. Thanks Mark for pointing to the right direction.

You're using the underContent rather than the overContent. Don't do that. It leaves you at the mercy of big, white-filled rectangles that some folks insist on drawing first thing. It's a hold over from less-than-good PostScript interpreters and hasn't been necessary for Many Years.
Okay, having viewed your PDF, I can see the problem is that this is an XFA-based form (from LiveCycle Designer). Acrobat can (and often does) rebuild the entire file based on the XFA (a type of xml) it contains. That's how your changes are lost. When Acrobat rebuilds the PDF from the XFA, all the existing PDF information is pitched, including your watermark.
The only way to get this to work would be to define the watermark as part of the XFA file contained in the PDF.
Detecting these forms isn't all that hard:
PdfReader reader = new PdfReader(...);
AcroFields acFields = reader.getAcroFields();
XfaForm xfaForm = acFields.getXfaForm();
if (xfaForm != null && xfaForm.isXfaPresent()) {
// Ohs nose.
throw new ItsATrapException("We can't repel XML of that magnitude!");
}
Modifying them on the other hand could be Quite Challenging, but here's the specs.
Once you've figured out what needs to be changed, it's a simple matter of XML manipulation... but that "figure it out" part could be interesting.
Good hunting.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

PDF to Black and White TIFF conversion using PDFBox looses quality - pdfbox

Related

ImageMagick.Net - convert pdf to tiff

How can I draw a text into a buffered image with different font affects in Java 2d?

SSRS ReportViewer - How to improve image quality when exporting to PDF

Page by page conversion of PDF into TIFF with proper compression

Some pdf file watermark does not show using iText

Categories

Resources