We have pdf documents (source: camera or scanner) that we want to convert to jpeg.
We use LEADTOOLS and PDF-TOOLS(in two separate programs) to convert these pdf files to jpeg files.
Both these tools use the default DPI of 150 irrespective of the DPI of the source pdf file.
We would rather like this value to be taken from the source pdf file.
For example: Adobe Acrobat software recognizes the source pdf file DPI and uses the same to create the jpeg file.
Is there some way we could achieve the same using the LEADTOOLS and PDF-TOOLS by determining the DPI of the source pdf file?
This feature was added to v19 of LEADTOOLS a few months ago. You can now extract images from PDF pages while preserving their original pixel dimensions using the following members of the Leadtools.Pdf.PDFDocument class:
ParseDocumentStructure method.
Images property.
DecodeImage method.
Furthermore, if the image is stretched inside the PDF page, you can detect that by examining its display size in the PDF page using the Leadtools.Pdf.PDFObject.Bounds property.
There's a dedicated demo for the PDFDocument class and related objects installed with LEADTOOLS 19 in these folders:
Examples\DotNet\CS\PDFDocumentDemo
Examples\DotNet\VB\PDFDocumentDemo
Related
I have a pdf with 3 images
I want to find each image and replace it with another image
I saw in the pdf the original paths under xmpMM:Ingredients:
I tried to change it via notepad++ but it looks like the images are already embedded and changing the path does nothing.
How can I find each image and replace it with another image?
The xmp stuff is information only. The actual images are embedded streams in the pdf file. Finding the correct streams to replace and replacing them isn't a simple problem, and can't be done with notepad. You'll need a library / toolkit that can modify PDFs, like https://pdf-lib.js.org/ or similar.
The PDF file looks like an Illustrator file, which adds another layer of weirdness - Illustrator can write PDFs that have both PDF and Illustrator versions of the content, and you see one in Acrobat and the other in Illustrator.
It's probably easier to recreate the PDF from whatever source produced it.
I have a PDF file that contains an image where this image is successfully displayed. When I try to extract the image from the PDF file using itextsharp or pdfsharp libs I get bytes, then decode them successfully (because there is /Filter/FlateDecode there). But when I try to convert these bytes to an image using different libs the exception occured (it looks like the bytes are actually not an image). As far as I understand the problem is processing these bytes, but the image in the Pdf is not corrupted because it is shown there correctly. PDF is here.
The images are most likely stored in the PDF image format which is documented in the PDF specification.
It is rather simple to convert them to the Windows BMP format. But still you must convert them and add headers with the specific information from the image attributes from the PDF file.
In PDF a new image line is byte-aligned, in Windows BMP it is DWORD-aligned.
Don't forget to extract the colour table if there is one.
The use of xfa inside pdf isn’t only for creating forms
Short : I need valid test cases for a new xfa ᴘᴅꜰ reader, but couldn’t found anyone nor I could find how to use ghostscript in order to create such test cases in batch.
The point is I don’t know how to build the extra information ghostscript should handle without an hex editor.
Ghostscript doesn't handle XFA at all, neither on input nor in output, you cannot use Ghostscript to create XFA files.
Nor does Ghostscript (currently) create PDF files which solely consist of an image. Even if it did, these wouldn't be PNG or TIFF images, as those file formats are not directly supported by PDF. The next release of Ghostscript will contain devices which produce PDF files where the content is a rendered bitmap image created from the input. But they won't be either PNG or TIFF file format.
Note that XFA has been removed from the PDF 2.0 specification (hardly surprising as its XML not PDF format).
I'm working with PDFClown and I'm trying to extract images from a pdf file. I use the example code provided by the source code that can be found at http://pdfclown.org.
ImageExtractionSample.java.
The problem is the images are negative and flipped horizontally. Does anyone know how to resolve this problem?
Check with other PDF files to see if other PDF files are also giving the rotated or flipped images. ImageExtractionSample.java is not checking rotation or matrix defined transformations for the image object but just writes the content to a file as is (so it will work for JPG images but not for CCIT encoded images for example).
So there are things to consider when you extract image from PDF:
image can be rotated using the attached transformation matrix (CTM);
image can be rotated/transformed as part of the form which is transformed;
image can be placed without transformation on a page but the page itself is rotated;
image may contain the overlaid Mask on top of it (and the Mask can be rotated and transformed);
JPG image is stored pretty much as is but there are other formats supported by PDF like CCIT compression, LZW compressed images etc;
But the general suggestion is that when you extract JPG image from PDF using PDFClown you should just flip and rotate extracted images like suggested on the SourceForge project discussion page.
if you could point to the particular PDF sample file then it would be easier to suggest the solution.
If you're on Windows then you may use this free PDF Multitool utility to compare non-transformed and transformed images from PDF using "Extract raw images (without transformation)" option in images extraction dialog.
Disclaimer: I work for ByteScout, the PDF Multitool utility is free for both commercial and non-commercial purposes.
I got issues in rendering a couple of images and texts in PDF in Telerik PDF viewer - according to Telerik's documentation it seems those texts/images formats are incompatible.
Are there ways to convert existing images in a PDF and replace it back to the PDF so to make the file compatible to the Telerik PDF viewer?
Many thanks