Magick++ - Reading JPEG2000 images - pdf

I'm trying to read JPEG2000 images in Magick++ (the C++ API of ImageMagick). To read an image I use the following code:
Image img("path/to/my/image.jp2");
But when I try to do this, ImageMagick throws an Exception and doesn´t load the image.
I extract the images out of PDF files. Could it be that something´s different to normal JPEG2000 images? To extract the images I read the stream of Image objects which have a JPXDecode-filter and save them to a file.
Hope someone can help me!

ImageMagick uses a package called JasPer to handle JPEG2000's. According to the wikipedia page on OpenJpeg, JasPer does not completely support the JPEG2000 specification. I have several extrected JPEG2000 that open fine in QuickTime, but fail to decode with ImageMagick.
I have had better results using OpenJpeg to decode the the Jpeg2000. The interface is less flexible, it will convert to PNG and BMP.

Related

Generate jpeg-YCbCr tiles in geotiff file with jfif format instead pure jpeg format

Currently, my app creates GeoTiff tiled files using following options:
PROFILE=GeoTIFF
TILED=YES
BLOCKXSIZE=xxx
BLOCKYSIZE=xxx
COMPRESS=JPEG
PHOTOMETRIC=YCBCR
JPEG_QUALITY=xx
However, some apps that use my served tiles do not work due to "invalid" JFIF format.
How can I force gdal to ensure JFIF format in GeoTiff tiles?
See my own answer in https://gis.stackexchange.com/questions/426732/generate-jpeg-ycbcr-tiles-in-geotiff-file-with-jfif-format-instead-pure-jpeg-for/428023#428023.
Basically, solution involves gdal code modifications

Tesseract cannot recognize my image correctly

I am developing an Android app now, it needs to recognize captcha from website.
I utilize the tess-two to recognize captcha and follow TrainingTesseract3 instructions to train my own traineddata (using jTessBoxEditor to correct characters), but it cannot recognize correctly and even cannot recognize it.
The below TIFF image is that I use to train my Tesseract, I collect many captchas and merge them into a image.
TIFF image
The image that I want to recognize
For example, the expected result of the above image should be k8666, but the actual result is only 66.
Does anyone give me a help? Thanks.
I tried your images using a .NET wrapper for tesseract-ocr Tesseract-ocr .Net Wrapper by Charliesw.
I got some better results like (K8EEE, K8656), i think you have to increase the text font and make it bold and i saved the image in tiff format with 96DPI resolution to get a better results than mine.

phantomjs output file size: png v gif

With phantomjs you can choose the file format to use for page.render().
I'm finding that the file size I'm getting for png is around three times higher than what I'm getting for gif. I wasn't aware that png should be any worse (in terms of file size) than gif; in fact I thought png was meant to be better.
Unfortunately, I kinda need to output to png because of its support for variable opacity, but the larger file size is a bit of an issue.
So, is there any way in which I can control file size of the png? Maybe change the encoding scheme or something? I'm currently using phantomjs 1.9.8.
Inside of PhantomJS
No, there is no way to make the png file size smaller, but there is a way to make it bigger (just for fun):
Render the file to png,
load the file to a canvas of appropriate size,
get the Data-URI of the canvas in png or any other format,
decode the Base 64 part and write to file (this is very tricky to get right).
PhantomJS 1.x has a bug which results in a vastly inflated, but valid file.
Only jpeg rendering enables you to specify a quality setting which will result in a smaller file size, but then again jpeg doesn't support transparency.
You could also see whether PhantomJS 2.0.0 behaves better, because it has an engine underneath it that is almost three years newer than in PhantomJS 1.x.
Outside of PhantomJS
Your best bet would be to render the png in PhantomJS as-is and post-process it with your favorite library. It may even be enough to open it and save it again.
You can for example call an installed program with the child process module or your can open a webpage that contains such a service and upload the captured file or base64 representation of it. The possibilities are endless.

convert tiff to PDF using PDFClown

Is it possible to convert tiff file into a PDF file, using pdfclown?
I've started a project using PDFClown, and I'm afarid I got stuck (maybe I'll have to switch to IText now...)
Thanks.
No, it isn't. PDFClown supports only JPEG images, as stated on their Features page.

PDFBox : Converting to image : Quality loss when converting PDF containing scanned documents

My use case is pretty simple. I need to convert the PDFs to images.I tried using apache pdfbox and i am having some trouble in converting pdfs which contains scanned images. when i convert scanned image the image clarity is lost due to compression/scaling. So i was trying to extract the image data from the PDF and then store it. But the problem is i may get PDF files which will contain images and text in which case i would need to fallback to image conversion mode. The problem is how to differentiate between the pages/documents having only image and the ones with composite data. I was thinking i could use ProcSet defenition for this purpose but looks like it is marked as obsolete and non-reliable according to PDF specifications. Other possibility is to check all the objects linked to that page and see if it contains anything other than images. Please let me know if there is an easier way of doing this
Thanks
If your intention is convert pdf to image, It is better to use ImageMagick for that. If you use ImageMagick, there is a lot options to change the quality of the image. And converting pdf to image is pretty simple using ImageMagick.