Apply color band to TIFF in the PDF - pdf

• Background :
We are developing AFP to PDF tool. It involves conversion of AFP (Advanced Function Processing) file to PDF.
• Detailed Problem statement :
We have AFP file with embedded TIFF Image. The image object is described in Function Set 45, represented somewhat like this -
Image Content
Begin Tile
Image Encoding Parameter – TIFF LZW
Begin Transparency Mask
Image Encoding Parameter – G4MMR
Image Data Elements
End Transparency Mask
Image Data Elements (IDE Size 32) – 4 bands: CMYK
End Tile
End Image Content
We want to write this tiled image to PDF either using Java /iText API.
As of now, we can write G4MMR image. But, we are not able to apply CMYK color band data (Blue Color) to this image.
• Solution tried :
The code to write G4MMR image goes as follows –
ByteArrayOutputStream decode = saveAsTIFF(<width>,<height>,<imageByteData>);
RandomAccessFileOrArray ra=new RandomAccessFileOrArray(saveAsTIFF.toByteArray());
int pages = TiffImage.getNumberOfPages(ra);
for(int i1 = 1; i1 <= pages; i1++){
img1 = TiffImage.getTiffImage(ra, i1);
}
img1.scaleAbsolute(256, 75);
document.add(img1);
saveAsTIFF method is given here –
http://www.jpedal.org/PDFblog/2011/08/ccitt-encoding-in-pdf-files-converting-pdf-ccitt-data-into-a-tiff/
As mentioned, we are not able to apply CMYK 4 band image color data to this G4MMR image.
• Technology stack with versions of each component :
1. JDK 1.6
2. itextpdf-5.1
-- Umesh Pathak

The AFP resource you're showing is a TIFF CMYK image compressed with LZW. This image is also using a "transparency mask" which is compressed with G4MMR ( a slightly different encoding than the traditional Fax style G4).
So the image data is already using the CMYK colorspace, each band (C,M,Y,K) is compressed alone using simple LZW encoding and should not be too difficult to extract and store as a basic TIFF CMYK file. You'll also have to convert the transparency mask to G4 or raw data to use it in a pdf file to maks the CMYK image.
If you want better PDF output control, I suggest you take a look at pdflib

You need to add a CMYK colorspace to your image before adding it to the PDF file. However I am afraid this might not be fully supported in iText. A workaround for you could be to convert your image into the default RGB colorspace before adding it to the PDF file, however this will probably imply some quality loss for your image.

Related

Convert a region of each PDF page to grayscale

I have a PDF that I want to print and a small region of each page has a thick rainbow at the left border. It is on each page. In order to save color ressources I would like to convert only this region to grayscale - or remove it completely with a white rectangle. I have looked into imagemagick but could not find a suitable solution to keep all the other color on the pages.
I have also thought of exporting each page to a separate PDF, apply a rectangle filter to each pdf and then combine it again. But I would prefer a simpler approach as the quality of the graphs seem to decrease each time I convert a pdf.
You do not have to extract each page to do that in ImageMagick. You can process it all in one command. Here is an example.
Create PDF:
convert lena.jpg mandril3.jpg zelda1.jpg test.pdf
Create white image:
convert -size 100x100 xc:white white.png
Apply white image to every page of PDF:
convert test.pdf null: white.png -geometry +50+50 -layers composite result.pdf

Image replacement using PDFBox does not change size of pdf according to image

I am using PDFBox 2.0.8 to replace image in my application. I am able to extract the image and replace the same with another image of same dimension. However, there is no decrease in the size of PDF if there is decrease in the size of image. For example refer the documents/images in the below links. Original size of PDF is 93 KB. Extracted image is 91 KB. Replaced image is 54 KB. PDF size after image replacement is still 92 KB....
Original Document = http://35.200.192.44/download?fileName=/outbox/pdf/10_cert.pdf
Extracted Image = http://35.200.192.44/download?fileName=/outbox/pdf/image0.jpg
Replacement image = http://35.200.192.44/download?fileName=/outbox/pdf/image1.jpg
PDF after replacement = http://35.200.192.44/download?fileName=/outbox/pdf/10_cert1.pdf.
The change in size of PDF after replacement is not in the same proportion... Code snippet used for image replacement is
BufferedImage buffered_replacement_image_file = ImageIO.read(new File(replacement_image_file));
PDImageXObject replacement_img = JPEGFactory.createFromImage(doc, buffered_replacement_image_file);
resources.put(xObjectName, replacement_img);
The images in your two example PDFs are identical. This most likely is due to the way you load the image data, first creating a BufferedImage from the file and then creating a PDImageXObject from that BufferedImage. This causes the input image data to be expanded to a plain bitmap and then re-compressed to JPEG identically by JPEGFactory.createFromImage.
To use the JPEG data as they initially are, try this approach instead:
PDImageXObject replacement_img = JPEGFactory.createFromStream(doc, new FileInputStream(replacement_image_file));
resources.put(xObjectName, replacement_img);
or, if the replacement_image_file is not necessarily a JPEG file, like this
PDImageXObject replacement_img = PDImageXObject.createFromFileByExtension(new File(replacement_image_file), doc);
resources.put(xObjectName, replacement_img);
If this doesn't help, you most likely have other issues in your code, too, and need to show more of it.

Storing jpg images into a pdf file in a "lossless" way

Given a directory with several jpg files (photos), I would
like to create a single pdf file with one photo per page.
However, I would like the photos to be stored in the pdf file unchanged; i.e., I would like to avoid decoding and recoding.
So ideally I would like to be able to extract the original jpg files (maybe minus the metadata) from the pdf file, using, e.g., a linux command line too like pdfimages.
My ideas so far:
imagemagick convert. However, I am confused by the compression options: If I choose 100% quality, does it mean that the jpg is internally decoded, and then encoded lossless? (Which is obviously not what I want?)
pdflatex. Some people claim that the graphics package includes images lossless, while other dispute that. In any case, pdflatex would be slightly more cumbersome (I would first have to find out the dimensions of the photos, then set the page size accordingly, make sure that ther are no margins, headers etc etc).
img2pdf (PyPI page):
Losslessly convert raster images to PDF without re-encoding PNG, JPEG, and
JPEG2000 images. This leads to a lossless conversion of PNG, JPEG and JPEG2000
images with the only added file size coming from the PDF container itself.
Other raster graphics formats are losslessly stored using the same encoding
that PNG uses. Since PDF does not support images with transparency and since
img2pdf aims to never be lossy, input images with an alpha channel are not
supported.
(pdfimages -all does the exact opposite.)
You could use the following small script which relies on HexaPDF (note: I'm the author of HexaPDF) to do this.
Note: Make sure you have Ruby 2.4 installed, then run gem install hexapdf to install hexapdf.
Here is the script:
require 'hexapdf'
doc = HexaPDF::Document.new
ARGV.each do |image_file|
image = doc.images.add(image_file)
page = doc.pages.add
iw = image.info.width.to_f
ih = image.info.height.to_f
pw = page.box(:media).width.to_f
ph = page.box(:media).height.to_f
rw, rh = pw / iw, ph / ih
ratio = [rw, rh].min
iw, ih = iw * ratio, ih * ratio
x, y = (pw - iw) / 2, (ph - ih) / 2
page.canvas.image(image, at: [x, y], width: iw, height: ih)
end
doc.write('images.pdf')
Just supply the images as arguments on the command line, the output file will be named images.pdf. Most of the code deals with centering and scaling the images to nicely fit onto the pages.
Another possibility for storing jpg images into a pdf file in a "lossless" way is provided by PoDoFo:
podofoimg2pdf is able to perform lossless conversion from JPEG to PDF by embedding the jpg file into the pdf container.
podofoimg2pdf
Usage: podofoimg2pdf [output.pdf] [-useimgsize] [image1 image2 image3 ...]
Options:
-useimgsize Use the imagesize as page size, instead of A4
Depending on what you wish to do with the files, on windows, if the images are simpler jpeg/gif/tif/png you can store in a cbz, zip, folder or zipped folder and view with SumatraPDF which has the SaveAs PDF option thus all done with one exe.
It will fail with files that are viewable but not acceptable as PDF inputs such as webp or heic, so check in the viewer what the filename extension is before.
It should in practically all cases be lossless, however you should roundtrip with pdfimage -all to do a file compare between input and output to check there was no need to convert any bytes.

Reading CMYK colors for graphic vectors from PDF

I am trying to read CMYK colors from a PDF file for graphic vectors, I am using PDFBOX 2 to read the color space, The color space being returned is of type PDSeparation with alternative color space of PDDeviceCMYK, I didn't know how to proceed with PDDeviceCMYK, so I extracted the RGB colors and will convert them back to CMYK, but I didn't even find a function to convert them back to CMYK, so is there a way to extract the CMYK colors directly from PDDeviceCMYK ?
PDColor color = getGraphicsState().getNonStrokingColor();
PDSeparation colorSpace = (PDSeparation) color.getColorSpace();
float[] rgb = colorSpace.toRGB(color.getComponents());
There are no CMYK colours in a Separation space, its a spot colour, for example a Pantone colour or something like Silver or Gold. You print it using the specific required ink.
In order to print (and display) the content on devices which don't have the required ink, Separation spaces have an Alternate colour space and a method for converting the input ink percentage into that colour space.
In your case the Alternate is DeviceCMYK and there will be a PDF Function which takes 1 input and returns 4 outputs. Given a colour between 0 and 1 of the Separation ink, it will return the equivalent CMYK values.
There are no RGB components for you to recover from the file either, I presume that colorSpace.toRGB() is retrieving the ink value, running the function to convert that to CMYK and then converting the CMYK to RGB. Assuming that pdfbox has a colorSpace.toCMYK() function I would use that instead.
In Addition to what #KenS said in his first comment, and with the help of #Tilman, you can extract the CMYK colors by overriding protected / private code inside PDSeperation.java, you can do it like this, I am not posting the entire code, but the section to read the colors are posted below
private static final int TINT_TRANSFORM = 3;
PDColor color = getGraphicsState().getNonStrokingColor();
COSArray array = (COSArray) color.getColorSpace().getCOSObject();
PDFunction tintTransform = PDFunction.create(array.getObject(TINT_TRANSFORM));
cmykColor = tintTransform.eval(color.getComponents());

How to compress images (png, jpg and so on) using objective C

i want to shrink png or jpg on OSX. i only want to shrinkg without affecting the image quality.
like tinypng.org
is there any recommended library? i just know imagemagick. is there a way to do that natively? or another library to shrink/compress images without affecting the image quality?
my aim is to shrink the file size, for example:
logo.png >> 476 k before shrink
logo.png >> 50k after shrink
Edit: to be clear, i want to compress the size of the file, not the image resolution.
TinyPNG.org works by using image quantisation - the similar colours in the image are converted into a HSV or RGB model and then merged depending on the distance.
How does it work?
...
When you upload a PNG (Portable Network Graphics) file, similar colours in your image are combined. This technique is called “quantisation”
...
src: http://tinypng.org
An answer here outlines a method of doing so: https://stackoverflow.com/a/492230/556479.
There are also some answers on this question with refer to how you can do so on Mac OS using objective-c: How do I reduce a bitmap to a known set of RGB colours
See Wikipedia for a more in depth guide: http://en.wikipedia.org/wiki/Color_quantization
Did you have a problem using ImageMagick? It has a rich set of quantize functions such as
bool MagickQuantizeImage( MagickWand mgck_wnd,
float number_colors,
int colorspace_type,
float treedepth,
bool dither,
bool measure_error )
Here is a very thorough guide to quantization using imageMagick
My suggestion is to use http://pngnq.sourceforge.net, it will give better results than ImageMagick and for the single example given in http://tinypng.org, it also produces a very similar output. It is a tiny C implementation of the method present in the paper "Kohonen Neural Networks for Optimal Colour Quantization". That alone is much better since you are no longer relying on closed unknown implementations.
Original (57 KB), tinypng.org (16 KB), pngnq (17 KB):
Using ImageMagick, the best quantization to 256 colors I can get uses the LAB colorspace and dithering by Floyd-Steinberg:
convert input.png -quantize LAB -dither FloydSteinberg -colors 256 output.png
This produces a 16 KB png, but it contains much more visual artifacts: