Image replacement using PDFBox does not change size of pdf according to image - pdfbox

I am using PDFBox 2.0.8 to replace image in my application. I am able to extract the image and replace the same with another image of same dimension. However, there is no decrease in the size of PDF if there is decrease in the size of image. For example refer the documents/images in the below links. Original size of PDF is 93 KB. Extracted image is 91 KB. Replaced image is 54 KB. PDF size after image replacement is still 92 KB....
Original Document = http://35.200.192.44/download?fileName=/outbox/pdf/10_cert.pdf
Extracted Image = http://35.200.192.44/download?fileName=/outbox/pdf/image0.jpg
Replacement image = http://35.200.192.44/download?fileName=/outbox/pdf/image1.jpg
PDF after replacement = http://35.200.192.44/download?fileName=/outbox/pdf/10_cert1.pdf.
The change in size of PDF after replacement is not in the same proportion... Code snippet used for image replacement is
BufferedImage buffered_replacement_image_file = ImageIO.read(new File(replacement_image_file));
PDImageXObject replacement_img = JPEGFactory.createFromImage(doc, buffered_replacement_image_file);
resources.put(xObjectName, replacement_img);

The images in your two example PDFs are identical. This most likely is due to the way you load the image data, first creating a BufferedImage from the file and then creating a PDImageXObject from that BufferedImage. This causes the input image data to be expanded to a plain bitmap and then re-compressed to JPEG identically by JPEGFactory.createFromImage.
To use the JPEG data as they initially are, try this approach instead:
PDImageXObject replacement_img = JPEGFactory.createFromStream(doc, new FileInputStream(replacement_image_file));
resources.put(xObjectName, replacement_img);
or, if the replacement_image_file is not necessarily a JPEG file, like this
PDImageXObject replacement_img = PDImageXObject.createFromFileByExtension(new File(replacement_image_file), doc);
resources.put(xObjectName, replacement_img);
If this doesn't help, you most likely have other issues in your code, too, and need to show more of it.

Related

Difference in the output seen in radiant viewer after saving/writing same dicom image using sitk.Write

I'm reading a dicom image and accessing its pixel array. After that saving that array again into dicom format using sitk.Write but there is different between original image that is going to be read and same image after being written. How can I get the same image display. I'm using Radiant Viewer for visualization of Dicom images. I want same output as of input. Code along with input and out image is given below:
# Reading a dicom image
Image = pydicom.dcmread('Input.dcm')
output = Image.pixel_array
#Saving the image into another folder
img = sitk.GetImageFromArray(output)
sitk.WriteImage(img, 'output.dcm' )
Size of dicom image is greater so sending the screenshots of input1 and output2 image
I'm gonna answer my own question here. So, the Photometric Interpretation of my original image was MONOCHROME1. But after converting the image into pixel arrays and then saving again with .dcm format, its some details were changed one of which was Photometric Interpretation that changed from MONOCHROME1 to MONOCHROME2. I changed this of the saved image and then saved again as follows.
elem = image[0x0028, 0x0004]
elem.value = 'MONOCHROME1'
image.save_as('P1_L_CC.dcm',write_like_original=False)

Storing jpg images into a pdf file in a "lossless" way

Given a directory with several jpg files (photos), I would
like to create a single pdf file with one photo per page.
However, I would like the photos to be stored in the pdf file unchanged; i.e., I would like to avoid decoding and recoding.
So ideally I would like to be able to extract the original jpg files (maybe minus the metadata) from the pdf file, using, e.g., a linux command line too like pdfimages.
My ideas so far:
imagemagick convert. However, I am confused by the compression options: If I choose 100% quality, does it mean that the jpg is internally decoded, and then encoded lossless? (Which is obviously not what I want?)
pdflatex. Some people claim that the graphics package includes images lossless, while other dispute that. In any case, pdflatex would be slightly more cumbersome (I would first have to find out the dimensions of the photos, then set the page size accordingly, make sure that ther are no margins, headers etc etc).
img2pdf (PyPI page):
Losslessly convert raster images to PDF without re-encoding PNG, JPEG, and
JPEG2000 images. This leads to a lossless conversion of PNG, JPEG and JPEG2000
images with the only added file size coming from the PDF container itself.
Other raster graphics formats are losslessly stored using the same encoding
that PNG uses. Since PDF does not support images with transparency and since
img2pdf aims to never be lossy, input images with an alpha channel are not
supported.
(pdfimages -all does the exact opposite.)
You could use the following small script which relies on HexaPDF (note: I'm the author of HexaPDF) to do this.
Note: Make sure you have Ruby 2.4 installed, then run gem install hexapdf to install hexapdf.
Here is the script:
require 'hexapdf'
doc = HexaPDF::Document.new
ARGV.each do |image_file|
image = doc.images.add(image_file)
page = doc.pages.add
iw = image.info.width.to_f
ih = image.info.height.to_f
pw = page.box(:media).width.to_f
ph = page.box(:media).height.to_f
rw, rh = pw / iw, ph / ih
ratio = [rw, rh].min
iw, ih = iw * ratio, ih * ratio
x, y = (pw - iw) / 2, (ph - ih) / 2
page.canvas.image(image, at: [x, y], width: iw, height: ih)
end
doc.write('images.pdf')
Just supply the images as arguments on the command line, the output file will be named images.pdf. Most of the code deals with centering and scaling the images to nicely fit onto the pages.
Another possibility for storing jpg images into a pdf file in a "lossless" way is provided by PoDoFo:
podofoimg2pdf is able to perform lossless conversion from JPEG to PDF by embedding the jpg file into the pdf container.
podofoimg2pdf
Usage: podofoimg2pdf [output.pdf] [-useimgsize] [image1 image2 image3 ...]
Options:
-useimgsize Use the imagesize as page size, instead of A4
Depending on what you wish to do with the files, on windows, if the images are simpler jpeg/gif/tif/png you can store in a cbz, zip, folder or zipped folder and view with SumatraPDF which has the SaveAs PDF option thus all done with one exe.
It will fail with files that are viewable but not acceptable as PDF inputs such as webp or heic, so check in the viewer what the filename extension is before.
It should in practically all cases be lossless, however you should roundtrip with pdfimage -all to do a file compare between input and output to check there was no need to convert any bytes.

How can I crop by x,y,width,height all images in a folder, resize them, then save them?

I am new to Photoshop scripting, but no stranger to Javascript.
I have a folder of images of 1024*1024 that are frames of an animation in a 3d program.
There is only an area at x=54, y=12, width=300, height=234 for all the frames.
After the crop I would like them to be scaled at 65% or whatever I want.
Alternatively I would like the source image to be scaled, the image moved x / y coordinates at 65% reduction so that the outside pixels don't make it in the final product.
There are no psds to speak of, I assume the script would create a blank psd and most likely have it recycled for the batch crop/resize.
Try something along these lines for the cropping and resizing. You can also copy all the images into a single PSD before you save if that is what you're after, but this sample just saves over the original document. For more info check out your Photoshop JavaScript Reference pdf in your Photoshop install directory.
var dir = new Folder('/c/temp')
var files = dir.getFiles("*.psd"); //change for whatever file type you have
for (var i = 0; i < files.length; i++) {
var doc = app.open(files[i]);
var bounds = [54, 12, 354, 246];
doc.crop(bounds);
//do the math to figure out how big you want it after resize
doc.resizeImage(newWidth, newHeight);
//note this is saving over the original!!!!
doc.close(SaveOptions.SAVECHANGES)
}
why not just record the action in photoshop???
open the first file and create a new action.
proceed to trim the canvas as needed with either the crop tool or canvas resize.
run the action as a batch process on the folder as needed.

Apply color band to TIFF in the PDF

• Background :
We are developing AFP to PDF tool. It involves conversion of AFP (Advanced Function Processing) file to PDF.
• Detailed Problem statement :
We have AFP file with embedded TIFF Image. The image object is described in Function Set 45, represented somewhat like this -
Image Content
Begin Tile
Image Encoding Parameter – TIFF LZW
Begin Transparency Mask
Image Encoding Parameter – G4MMR
Image Data Elements
End Transparency Mask
Image Data Elements (IDE Size 32) – 4 bands: CMYK
End Tile
End Image Content
We want to write this tiled image to PDF either using Java /iText API.
As of now, we can write G4MMR image. But, we are not able to apply CMYK color band data (Blue Color) to this image.
• Solution tried :
The code to write G4MMR image goes as follows –
ByteArrayOutputStream decode = saveAsTIFF(<width>,<height>,<imageByteData>);
RandomAccessFileOrArray ra=new RandomAccessFileOrArray(saveAsTIFF.toByteArray());
int pages = TiffImage.getNumberOfPages(ra);
for(int i1 = 1; i1 <= pages; i1++){
img1 = TiffImage.getTiffImage(ra, i1);
}
img1.scaleAbsolute(256, 75);
document.add(img1);
saveAsTIFF method is given here –
http://www.jpedal.org/PDFblog/2011/08/ccitt-encoding-in-pdf-files-converting-pdf-ccitt-data-into-a-tiff/
As mentioned, we are not able to apply CMYK 4 band image color data to this G4MMR image.
• Technology stack with versions of each component :
1. JDK 1.6
2. itextpdf-5.1
-- Umesh Pathak
The AFP resource you're showing is a TIFF CMYK image compressed with LZW. This image is also using a "transparency mask" which is compressed with G4MMR ( a slightly different encoding than the traditional Fax style G4).
So the image data is already using the CMYK colorspace, each band (C,M,Y,K) is compressed alone using simple LZW encoding and should not be too difficult to extract and store as a basic TIFF CMYK file. You'll also have to convert the transparency mask to G4 or raw data to use it in a pdf file to maks the CMYK image.
If you want better PDF output control, I suggest you take a look at pdflib
You need to add a CMYK colorspace to your image before adding it to the PDF file. However I am afraid this might not be fully supported in iText. A workaround for you could be to convert your image into the default RGB colorspace before adding it to the PDF file, however this will probably imply some quality loss for your image.

How to get DPI, width and length of an image in PDF in PHP

Suppose a single image is saved as pdf. How can I get DPI, width and length information about the image in PDF file? How can I do it in PHP? Basically I want to retrieve the following information:
On a particular private website I uploaded my pdf and got following informtaion:
Size of input file: 285.81 KB
Import time: 0 sec
Source document has been created with Adobe Photoshop CS4 Macintosh
PDF file has been produced using Adobe Photoshop for Macintosh -- Image Conversion Plug-in
Creation
date of source document is: D:20091002102636+05'30'
Recognized page format at import: Custom (2.997 cm x 5.004 cm)
Document contains 1 page(s).
The following file properties may cause problems:
Page 1 (Image4: Pos. x: 1.499 cm y:2.502 cm, width: 2.997 cm height: 5.004 cm): Resolution of grayscale image too high (found: 300.00 dpi - demanded: 170.00 dpi)
DPI is irrelevant in PDFs themselves. Your concern is with an image embedded within it, and to parse those out easily you will probably want to rebuild with a library that handles the file IO for you, such as http://www.pdflib.com