I know there's tons of threads about this "out there" but all I can find is bitmap to pdf and how do add images to a PDF.
I have a PDF which I would like to convert to JPEG. I've tried to use the iTextSharp but I can only find info about making a pdf, not the other way araound. Any ideas or links to actual code?
ImageMagick uses Ghostscript to handle PDFs so if this is your only task I'd recommend just using Ghostscript. There's a managed wrapper here and you can get the Ghostscript binaries from here. They come in an installer but you can just extract them using 7-Zip. See this discussion on what you need to deploy in your app. You might have to play around with 32-bit vs 64-bit. Also, on the Ghostscript download page please read the "Which license is right for me?" section.
Related
I'm trying to extract text from PDF files using the Google Cloud Vision API. It works most of the times, but I get gibberish in a few cases. I tried both DOCUMENT_TEXT_DETECTION and TEXT_DETECTION, I tried forcing the language in the languageHints but it didn't help.
Then I tried with a screenshot saved as tiff and this did work, so I'm guessing that Google tries to use the text in the PDF if it's not just a picture. Indeed, when I select all "text" in the PDF, I get gibberish.
When I print the tiff back into PDF, text extraction works. So it's really something weird with the PDF. But other extraction software (such as abbyy) work well with the original PDF.
Has anyone had the same kind of issues?
One thing that could help would be an option to force treat the PDF as an "image PDF". Is there such an option?
Thanks for your help!
FYI, I am unfortunately not allowed to show the PDF, and I use the dotnet library.
Edit:
The info on the PDF is:
Creator: "PScript5.dll Version 5.2.2"
Producer: Acrobat Distiller 10.1.16 (Windows)
I faced with following problem with using ImageMagic. I have Java EE web application which need to make some thumbnails from uploaded PDFs. 3rd party application uploads PDFs to my application.
My application tryes to convert this PDF (and others) via following command
convert some.pdf -auto-orient -resize 100x100> some.png
As result of convertation I have this PNG. I'm trying for 3 days but wasn't able to figure out whats wrong with uploaded PDFs. Other PDFs is converted in correct way. What's wrong with this PDFs and how to convert it right?
Note: Imagemagick V6.8.6.6, GhostScript 8.64.
Upgrading your version of GhostScript should fix the problem. Ghostscript is responsible for creating an image from your PDF file. With the latest version (9.10) a correct image is created.
This is not strictly a programming question, but it's related to programming task I need to perform this in order to make an iPhone app.
I have a PDF file with a large image (say, a campus map) which I want to store as a PNG image to include as resource in the app. The image I want itself is much larger than the screen area (a lot larger, about 4000x4000 px). So I cannot just take a single screenshot of the PDF and save it as PNG. The only way I know to accomplish this is to take a number of screenshots of different parts of the image and manually stitch them together in an image editor. There will be 8-10 images to stitch together, if not more.
I wonder if anyone knows a more efficient way of doing this? Acrobat PDF reader does not allow this. Are there any tools or tricks in either Windows or MacOS I can use? Googling this did not bring anything that works.
It would also be an option to use the PDF directly, iOS has pretty good support for reading PDFs, see the ZoomingPDFViewer sample code from Apple for an example.
As for your actual question, I'm not sure if there are existing tools that do exactly what you want here (though I'd guess there are), but it would also be pretty easy to make a small Cocoa command-line tool that converts a PDF to a number of bitmap tiles using Core Graphics.
You could use Ghostscript to convert your pdf to a png.
A command like
gs -sDEVICE=png16m -r600 -o my_Map.png my_Map.pdf
would provide you a png from a pdf image.
We are using Tesseract's Java library, Called Tess4j to convert PDF files to text.
It works nicely with Tiff files as well as one page PDF files. But with multi-page PDF's it does generate the output file, when it comes to the last page, the control doesn't seem to come back to the original application which invoked the doOCR call. It just stays/hangs there without doing anything.
Is it an issue with the native call not returning back.i have no clue,
Please let me know if there is a solution to this issue, as soon as possible.
Regards
Vish
Tess4J does support multi-page PDF and multi-page TIFF. Substitute with your PDF file in the unit test case and give it a try.
I have an issue printing pdf file in applet. I got input from http and the stream is consutructed using the pdfstamper. The problem is that i want to send the resulted stream to printer, but i did not find how to do that.
UNless the printer supports PDF you cannot send it directly to the printer. You need to rasterize it. I wrote a blog article on printing PDFs from Java at http://www.jpedal.org/PDFblog/2010/01/printing-pdf-files-from-java/
PDFBox might manage it. I'm not aware of any other Java-specific PDF renderers out there, though I wouldn't be shocked to find there's a couple more out there.
Basically, any app that can convert a PDF to an image can probably act as a print driver.
GhostScript perhaps?