High quality alternatives to PDF - pdf

We are trying to figure out the best way to create a web service that delivers high quality text books to remote tablets and desktop clients. The books are copyrighted and sold to users so the delivery must be protected as much as possible against copy. The books' layout is very complicated, with lots of images, pictures, textures, tables, diagrams and the like. They are produced by InDesign in PDF format.
So far, our best guess is to store the PDF in single pages (a PDF per page) and scramble them with asymmetric keys, so all the decryption can be processed in memory with no temporary file generated.
Our concern is that PDF is a proprietary format and sometimes the file is too big (quality is an important concern for the client).
Is there any Open Source alternative to PDF, capable of delivering high quality, complicated layouts in smaller files?

Your only way around this if it is to be viewable offline is to encrypt the document and issue licence keys for it to be viewable.
There are commercial packages that will allow you to do this enabling you to limit the licence to machine, user or time period.
Ultimately you can't stop people coming up with ingenious ways of copying it, just make it more difficult.

You can use raster image with high quality as PDF alternative.

Related

does PDF support data degradation protection?

So we can add signatures to PDF files, which sign the content hash of the document.
however, if one bit flips due to bitrot, the file will be corrupt and the signature worthless.
Does PDF have some built in data integrity protection that would allow it to repair bitrot to a certain degree?
I'm aware that this can be achieved on a filesystem level, but I wonder if the PDF format itself also has facilities for this, and if so, how they can be enabled and whether they are included in PDF/A?
Does PDF have some built in data integrity protection that would allow it to repair bitrot to a certain degree?
No. Quite the contrary, data streams in PDFs may be (and often are) compressed using FLATE. In uncompressed content streams a bit flip usually only damages a single instruction or two, often having only an effect on small parts of the page rendering. But in a compressed content stream it usually damages all instructions starting at the flip. If this happens early in the stream, the whole page cannot be rendered anymore.

PDF data extraction

Is there a way for me to take a scanned PDF image and extract data from the image by highlighting the fields that are needed? We scan thousands of PDF images of real estate deeds daily and would like to be able to automate the data entry process. The problem that we are facing is that no two deeds are the same.
It has been said in comments that Stackoverflow is mainly about programming issues.
Nevertheless, there are possibilities, depending on the actual documents, and the volumes to be processed.
On the high end, there is a product called Teleform, originally developed by Cardiff, and now owned by HP, which is used to process paper forms; you may also look at the Business Process application Cardiff LiquidOffice, now HP LiquidOffice.
On the low end, I have developed an application in PDF, running under Acrobat, which can take a scanned and OCRd form, and transfer the data to a specially prepared fillable form, from where the data can be exported towards a database, for example. For more information, a demo and a quote, feel free to contact me in private.
If you want to develop something using Acrobat, you could also begin with a OCRd document, and then use the capabilities of the Redaction function (or use the industrial strength Redaction tool Redax by Appligent) to find keywords, and then use the positional information of those keywords to extract more data.

Pdf tools to analyze pdf attributes

Is there any pdf tools that generate information regarding the loading time and memory usage to display pdf in browser, and also total element inside the pdf?
Unfortunately not really. I've done some of this research, not for PDF in a browser but (and perhaps this is what you are looking at as well) PDF on mobile devices.
There are a number of factors that contribute and that to some extent can be tested for:
Whether or not big images exist in the PDF and what resolution they are. This is linked directly to memory usage.
What compression method is used for image compression. Decompressing JPEG-2000 images specifically can increase load time significantly. Even worse, as JPEG-2000 can be progressively decompressed, it can give the appearance of a really bad PDF until the images has been fully decompressed and loaded (this is ugly specifically on somewhat older tablets for example).
How complex the transparency effects are that are used in the document.
How many fonts are used in the document.
How many line-art objects (vector elements) with a large number of nodes (points) are used on a page.
You can test what is in the document using Acrobat Pro to some extent (there is a well-hidden tool when you save an optimised PDF file that can audit what objects use how much of the space in a PDF document). You can also use a preflight solution such as pdfToolbox from callas (I'm affiliated with this company) or pitstop from enfocus; these tools would allow you to get a report with the results of custom checks such as image resolution, compression, vector objects, color spaces etc.

Reducing the size of pdf generated from software using proprietary fonts

I am trying to bring an Indian Magazine online. This magazine is typed in CorelDraw using the proprietary Devenagari font (http://www.modular-infotech.com/html/shreelipi.html). So these guys have provided a USB dongle that you have to have attached to the machine when you want to access the fonts, and this software has been in use for past 10 years.
To put the magazine online, we've tried to convert it to pdf (by printing). The resultant pdf size is of the order of 30-50MB, even when the pdf does not have even a single image. I am guessing it converts the whole text into an image
It would be really difficult for users to read this magazine given its size. Though when I convert it to .swf format (for add flipbook kind of functionality) - the size reduces to 5-6MB. But there are people who like to download the magazine and then read. I have had no luck reducing the size of pdf.
I have done lot of research on web. The postscript, primo pdf do not help much. The best I could get was 30% reduction using DocuCom pdf printer. But it is still 20MB. I have tried to play with resolution, compression and quality but the best I could get was 18MB.
Ideally I would like to reduce it to less than 2MB.
I would be really grateful if you could help me reduce the size of the pdf! Considering that it has no images, I am hopeful that I can get some really good compression.
The (35MB) magazine can be downloaded from: http://merajhola.in/jin-march.pdf
I can't see any easy way to reduce the size of this PDF. There are no embedded fonts and all the text is drawn using vector graphics primitives. No amount of tweaking the resolution, compression and quality will have a significant improvement.
One possible option would be to embed the font as a subset rather than use vector graphics. That will almost certainly make a big difference, however I doubt the proprietary font license will allow it.
I'm sorry, but this Shree-Lipi thing just sounds wrong in 2012. It would be much better to use proper OpenType fonts with modern (say InDesign) or free (say LuaTeX) software.

which version of PDF for general use/distribution?

I do a lot of quick-and-dirty PDF creation of long documents (100+ pages), for distribution to clients; my clients are often individuals, but sometimes corporate managers at banks and insurance companies.
Acrobat Pro allows you to save in many versions of PDF, from Acrobat 4 - Acrobat 10. Which should I use, as a general rule?
I don't often use advanced features in my documents: usually pictures and text. Since I send via email, I want the best compression possible... my documents often have lots of images. However, since my clients are banks and such, not cutting-edge technologists, I don't think they have the most recent Acrobat/PDF reader installed.
What is the best PDF version, as a compromise between document compression and widespread adoption?
I recommend PDF 1.4 - Acrobat 5. PDF/A-1 (PDF for archiving) standard is also based on PDF 1.4.