Is it Possible to OCR PDF, and output the Text in the PDF (Text Under Image).
Instead of an seperate file?
Yes, it is possible.
First convert the google cloud vision response to a hocr file using gcv2hocr.
gcv2hocr test.jpg.json output.hocr
Then use hocr-tools to stitch the hocr data to the pdf file.
hocr-pdf --savefile out.pdf <imgdir>
Related
Currently, my app creates GeoTiff tiled files using following options:
PROFILE=GeoTIFF
TILED=YES
BLOCKXSIZE=xxx
BLOCKYSIZE=xxx
COMPRESS=JPEG
PHOTOMETRIC=YCBCR
JPEG_QUALITY=xx
However, some apps that use my served tiles do not work due to "invalid" JFIF format.
How can I force gdal to ensure JFIF format in GeoTiff tiles?
See my own answer in https://gis.stackexchange.com/questions/426732/generate-jpeg-ycbcr-tiles-in-geotiff-file-with-jfif-format-instead-pure-jpeg-for/428023#428023.
Basically, solution involves gdal code modifications
Is it possible to convert tiff file into a PDF file, using pdfclown?
I've started a project using PDFClown, and I'm afarid I got stuck (maybe I'll have to switch to IText now...)
Thanks.
No, it isn't. PDFClown supports only JPEG images, as stated on their Features page.
I am working on iPad application in which i have to show PDF data into table. Firstly,I want to fetch PDF file content into NSString, how to achieve this. I tried a lot but i am unable to get it.
Thanks
You have to use Quartz2d.
Check this page of the Quartz 2D Programming Guide, it covers everything you need to open and parse a PDF file in iOS. Note that it is not a simple task, since there's no method to extract the full text in one line. You have to work with the data as an input stream, using a CGPDFScanner
I'm trying to read JPEG2000 images in Magick++ (the C++ API of ImageMagick). To read an image I use the following code:
Image img("path/to/my/image.jp2");
But when I try to do this, ImageMagick throws an Exception and doesn´t load the image.
I extract the images out of PDF files. Could it be that something´s different to normal JPEG2000 images? To extract the images I read the stream of Image objects which have a JPXDecode-filter and save them to a file.
Hope someone can help me!
ImageMagick uses a package called JasPer to handle JPEG2000's. According to the wikipedia page on OpenJpeg, JasPer does not completely support the JPEG2000 specification. I have several extrected JPEG2000 that open fine in QuickTime, but fail to decode with ImageMagick.
I have had better results using OpenJpeg to decode the the Jpeg2000. The interface is less flexible, it will convert to PNG and BMP.
I want to create html file by which i could read any PDF file by providing the source of that PDF file. How can i do this by using only html5?
For example i want read a pdf file which is available in C drive so scr="http://virdir/mypdf.pdf".
I want something like this.
You want to use the developing HTML 5 File API. Mozilla has a good explanation, and you can also refer directly to the spec.
Since PDF is a binary format, you will probably want to use FileReader.readAsBinaryString().
Parsing and rendering (e.g. to a canvas) a PDF in JavaScript is possible, but it would be very challenging.
Here is an open source pdf reader written in javascript.
https://github.com/mozilla/pdf.js
There are APIs available to play with. It comes built into Firefox browser and has good support from Mozilla community.