Load auto paged pdf in iOS like iBook - objective-c

In iBook, when you open a PDF, you can auto format and paged the pdf, e.g. if in iPhone, there are 5 pages, but when you view with iPad, it only contains 2 pages.
When you change the text size, the page also updated automatically.
How to do this using CGPDFDocumentRef?

I'm assuming you are talking about Apple iBooks on the iPad? Are you sure you are observing the behavior of a PDF and not an ePub file?
The native format of iBooks is either ePub or the format created by iBooks Author.
PDF files are usually (in the vast majority of cases) used in a non-reflowing way. Reproducing the exact visual appearance of pages - explicitly without reflow - is exactly why PDF was invented.
There are constructs you can add to PDF files to make them a little more alike to formats like HTML and ePub; these constructs can tag text with styles, logically define paragraphs, columns and tables and so on. Usually they are used to make a PDF file suitable for long-time archiving (according to the ISO PDF/A standard) or accessible (suitable for reading by screen-reader software for vision-impaired people for example). Such a PDF file is commonly referred to as a tagged PDF.
As far as I know iBooks doesn't actually support tagged PDFs (meaning, it doesn't use the information in such a PDF file to reflow the file). And as far as I know you cannot create the necessary tags and structure with the built-in iOS library.
If your target app is iBooks, you'd probably be better off looking into generating ePub...

Related

PDF - Auto adjust for mobile?

Does anyone know if there is a function in PDF's to allow them to auto-adjust the view depending on whether it is on a desktop or mobile? Or even by screen size?
I am looking to prepare PDF material for distribution, however, on the user group includes a mix of desktop and mobile, so instead of creating two PDFs I would like to have a single PDF which adapts to the users screen?
This is not possible with PDF files up to PDF version 1.7, the most commonly used on out there.
PDF 2.0 which was released three years ago has such a feature but it depends on the viewer implementing it and the PDF writer correctly annotating the PDF. I guess there are PDF viewers out there that can already do this but I'm not specifically aware of any.
If I were you, I would write the document in a format like LaTeX that can easily be converted to both kinds of PDFs, one for desktop and one for mobile.

How is hidden text stored in OCR-enhanced PDF files

// EDIT 26.03.2018 - Who wants to continue my work can have a look on my source-files https://github.com/n0l0cale/ocr-sampledata
I'm actually looking for some details about PDF Files. It's most important for me that the files will be usable for a very long time and if possible the OCR should be automatically applied for new files (which seems to be not really possible with Adobe Acrobat...).
For that I've been looking for different solutions how to OCR my PDF Files. I found three candidates which seems to be doing what they should do... (more or less). But all three variants have their pro&cons... But there seem to be different approaches how to store data in PDF Files.... for all three Variants... Let me explain:
a File OCRed with Adobe Acrobat:
https://github.com/n0l0cale/ocr-sampledata/blob/master/A4%20sample_ACROBAT.pdf
results in a file that Acrobat is able to open in one step (no preloading of any background layer) and after a preflight-script I'm able to see the text which is stored hidden:
a File OCRed with Abby Finereader:
https://github.com/n0l0cale/ocr-sampledata/blob/master/A4%20sample_ABBY.pdf
does not seem suitable for the default adobe preflight-script as it does not display any additional layers:
But far as I was able to reproduce these Files seems to have a Background-Text-Layer, which contains the OCRed Text, which is the underlying layer for the Image that is shown to the user at the end. Unfortunately this seems to be loaded separately and this is confusing while opening the file with Adobe Acrobat...
a File OCRed with Tesseract 4 (Alpha):
https://github.com/n0l0cale/ocr-sampledata/blob/master/A4%20sample_TESSERACT_oem2.pdf
is also doing some weird magic with the hidden text part:
But in all three cases I'm able to search for words in the files and see the text using "Remove hidden information" and selecting "hidden text":
I'm seriously confused.... Does anyone know how these programs are storing their hidden text information really?
S.
P.S.: For those wondering what this ominous preflight script is: https://theblog.adobe.com/hidden-gems-in-acrobat-dc-how-to-optimize-hidden-ocr-text/
Does anyone know how these programs are storing their hidden text information really?
You correctly have found out that the approach of Abby Finereader is different from that of Adobe Acrobat and of Tesseract:
Abby creates a page content stream in which first the text is drawn normally on the page and eventually covered by the scanned image.
Acrobat and Tesseract create content streams in which first the image is drawn and then the text is drawn invisibly (using text rendering mode 3 which draws nothing).
The difference between the latter two results is the choice of font used:
Acrobat uses regular standard 14 fonts for which a PDF viewer has a font program to render them as normal glyphs.
Tesseract uses a font GlyphLessFont it embeds a font program for into the result file. When rendered the glyphs in this font do not show as our normal Latin glyphs but merely as empty space.
Considering the visual effect you observed for the Abby result, the approach used by Acrobat or Tesseract might be preferable.
Whether one prefers fonts with visually recognizable glyphs (as used by Acrobat) or without (as used by Tesseract), is mostly a mere matter of taste. They are used only in the invisible rendering mode anyways.

How to merge PDFs into a PDFA1b with watermarks using iText5

Here is what I need to do:
Merge several PDF documents (which may or may not be PDFA) into one PDFA1b.
Add a watermark (a simple text label) on each page of the resulting PDF.
It has to be with iText 5
I have looked at this official merging example: http://developers.itextpdf.com/examples/merging-pdf-documents/adding-cover-page-existing-pdf
But can this method be used to create a PDFA, and also add watermarks?
Or am I stuck with using this other method which he specifically says not to use: http://developers.itextpdf.com/examples/merging-pdf-documents-itext5/how-not-merge-documents
You can create files that conform to PDF/A-1b with just about any PDF library including iText. PDF/A, in general, is a subset of ISO 32000 (PDF) so it's really just a matter of using the tool to do what you need to with the files but not adding anything that is forbidden by PDF/A-1b (in your case).
The thing to be aware of is that iText or any of the other libraries that "support" PDF/A, will not prevent you from modifying PDF in a way that is forbidden by PDF/A... you just need to know what those things are.
So... before merging, you'll want to be sure that the input files don't have any annotations or form fields or any other interactive content.
After merging, add your watermark as page content and be sure your XMP metadata is conforming and you should be OK.

How to create and save a .rtf, .doc, .docx in Objective-C for iOS

I am looking to create and save either a rtf, doc or docx file on an iPad (iOS).
The scenario is that we'd like to assist a user in creating content on their iPad and then let them email this as an editable document cross-platform (OS X, WIN).
I am open to other solutions besides the rtf, doc or docx file format.
Thanks,
James
RTF is going to be the easiest, because it's a plain text format. It's kind of like HTML, but without closing tags. Here is a class for writing an RTF, but it requires a lot of dependencies from elsewhere in the framework.
DOCX would be rather difficult. It's actually a zip file, containing a few XML files. You can examine the format yourself by changing the .docx extension to .zip and unzipping it. But even though XML is a fairly easy to write format, the way the text attributes are organized is still rather complicated. Also, I recall that it has to be zipped in a very specific way to be read properly.
As for DOC, it will be very difficult because it's such a complex format. You could look into some open source projects, like Abiword or Word2x. Be careful using their code because the licenses may not agree with the App Store rules.
I've seen doc & docx readers for iPhone (App store entry linked here), but I don't know of any open source frameworks you can make use of.
RTF format should be pretty simple to write, if you're up to the challenge. There is no built in framework support for it (here's a related question, b.t.w.).
Maybe you could write out something in a regular TEXT format and e-mail that?
Docmosis has a cloud service that you can reach from iOS. You can ask it to render a doc in various formats (doc, rtf, pdf, odt etc) and email it off or stream it back - though you have to be connected. Previewing DOC on iOS is possible but a little flaky. One option is to stream PDF back for display on iOS and email editable document (which can be done in one call).

A better file format than PDF or EPUB?

My client wants us to build a custom document viewer for their app. (It really, truly needs to be custom, because there are a ton of application-specific features they need.)
We built one for them last year that took PDFs, generated page images, and backed the images using a hidden layer of text that could be selected and copied. We did it in Flex. It was a nightmare. PDF is horrid.
This year, we need to build one in HTML 5 with similar requirements, except that most of the documents now are in Word or HTML, that is, they have reflowable text, instead of the fixed layout and glyphs of PDF. But they still want to do PDF in the same viewer.
I'm thinking that we need to convert all documents to some common file format that can handle both reflowable text and also the fixed-position glyphs of PDF. (Each document would probably support one or the other, but not both). It would be nice if it were an XML-like markup language that would say:
<text>here's some text</text>
-- or --
<glyph letter="a" name="my_a_glyph" position="10,10"/>
<image src="my_image" position="20,20"/>
or something like that.
Is there any existing file format out there that can handle it? EPUB won't do the fixed-position text, and PDF sucks in too many ways to describe.
I think you can look at FB2 (FictionBook 2) format . That is an XML-based format, designed for publishing books. It includes images, though I am not sure if they can be aligned absolutely.
Also, you can simply go with HTML and do HTML-to-PDF rendering when needed (there exists various components and libraries for this). I don't see (or you have not listed) any reasons why this way doesn't work.
GROFF? Maybe build a macro library to customize it, as needed.
Groff/troff/nroff, the "run off" programs of Unix, can output to postscript or HTML. The jump from postscript to PDF is built in to some PDF viewers; there are also several existing programs for it, pstopdf, for example.
GROFF has some fixed layout options and some flow-like options. With GROFF, it's almost easier to base most of the printout on flowing text, within proscribed bounds.