Highlighting text? - pdf

I've been searching for a while to know how Highlighting text (books on PDF, epub, mobi) works programmatically, what kind of code or tech stays behind this feature, yet I couldn't find anything. If you know, please share any hint here.
Thnx in advance

For PDFs you can highlight text by adding a highlight text markup annotation:
12.5.6.10 Text Markup Annotations
Text markup annotations shall appear as highlights, underlines, strikeouts (all PDF 1.3), or jagged (“squiggly”)
underlines (PDF 1.4) in the text of a document. When opened, they shall display a pop-up window containing
the text of the associated note. Table 179 shows the annotation dictionary entries specific to these types of
annotations.
(ISO 32000-1)
You can find details in Table 179. For a specification of annotations in general, read the earlier subsections of section 12.5 Annotations.
Alternatively you can also add the highlighting into the page content. This has the disadvantage that others cannot easily change the highlight (which actually might be an advantage in some use cases). Depending on the PDF viewers to support you might be forced to do this, though.

Related

Determine the Text that can Display in Multiline PDTextField

Is there a way to determine the text that will actually display in a PDTextField when the PDF prints? If I call setValue and then getValue, it returns all of the text even though it will not all display.
I am trying to fill out a form with a limited size multiline text field that has the notation to attach another page for more details. I would like to limit the text to that which will display and generate the added detail page.
Thanks for indulging a PDFbox newbie.
There is no direct way to find that out as the details of the text layout such as line breaks, padding, line spacing are hidden inside the non public class PlainTextFormatter inside the org.apache.pdfbox.pdmodel.interactive.formpackage. So you'd need to replicate that code.
PDFBox tries to resemble the calculations done by Adobe Acrobat and Adobe Reader but the details of such calculations are not part of the PDF specification. So doing your calculation is only valid for a similar layout model. Other form filling applications might have a slightly different layout model and as a result your results will not apply to these.
In addition to that Acrobat (and PDFBox) place text although it might be partially clipped. Look at the results of the AlignmentTest.javaunit test to see what I mean. So one might have a different expectation to what 'fitting' really means.
As I've thought about passing the information about which text fitted back to the calling application anyway I've opened an enhancement request https://issues.apache.org/jira/browse/PDFBOX-3413 for that.

iTextSharp reverses Arabic text when filling combed text field

I have a problem with iTextSharp which looks like it could be a bug.
I have a combed text field and when using iTextSharp to add Arabic text to it, the Arabic letters initially appear reversed when the field is "highlighted". So 'ف ا د ي' appears 'ي د ا ف'.
The moment I click on the field, the highlight disappears and the text appears in the correct direction.
This happens regardless of the direction and alignment and only happens in combed text fields.
Can anyone offer any solutions to this?
Note: I've added the iText tag as well because I have a hunch that this issue is not specific to iTextSharp only and I hope I can replicate any workarounds or solutions in iTextSharp. Regards,
You can usually fix this by setting GenerateAppearances to false on the form object.
Annotations in a PDF (which form fields are a version of) can have different "states" and for each of these "states" you can specify how you want a renderer to display that state. For instance, a checkbox can either be "checked" or "not checked" which is given, but how to render that actual checkmark isn't. Maybe an "X", maybe a ✓, maybe a ☑ or maybe something totally different. These different states are called their Appearance State.
If you don't set an appearance state for an annotation then you are effectively surrendering control of that state to the PDF renderer and letting it do whatever it wants.
Adobe's renderers (Acrobat and Reader) are the de facto standard for PDF renderers and recent ones are actually really good at "filling in the blanks", especially when it comes to things like RTL and many non-English/Latin things. Other renderers out there, including Google's, Apple's, Microsoft's and even your printer might not be as good at this, however, so you might want to test this.

Possible to control PDF layout with iText?

I'm writing some logic to build a large single PDF file that our users can print at their convenience. I'm using Java's iText library (through Clojure's clj-pdf).
I'm trying to have the PDF show the same exact template form on every single page, however I can't seem to find any documentation or indication that one can have PDF content "fit to a page".
The text in these forms varies a little bit, so there's a chance it might require more of fewer text lines per page. This means that the content has a chance of spilling over to the next page, or being too short, making the next page creep up into the previous one, breaking the requirement of "one form per page" for the rest of the document.
I'm trying to figure out if my option is pretty much only to manually check the length of the text on each page and potentially crop it by hand if I goes over n lines, or if the PDF format somehow supports a smart way of having paragraphs+tables+headings all fit in one page. Some UI systems allow you to control how spill-over is handled, anywhere from cropping to resizing the font, so I'm curious if PDF supports anything of that sort.
Edit: ended up going with pagebreaks for simplicity, wasn't aware of that option when I wrote this question.
If you want to take control over the space taken by text, for instance to fit it on a single page, the way to go would be to create a ColumnText object and to add the content in simulation mode. If the text fits the page, add it for real. If it doesn't, use a smaller font size. This is demonstrated in the MovieAds example where snippets of text are fitted into AcroForm fields.

Creating RTF text for clipboard and sharing DataPackage in WinRT

I'm sure this is just a google search away, but I can't find the right search terms to find what I'm looking for.
I've created a DataPackage that has both HTML annd plain text content. I've used this in my copy and my sharing code and it works fine. I now want to create RTF output as some apps don't seem to accept HTML clipboard content.
I'm looking for a good guide to making RTF text that can be added to the DataPackage. I just need simple formatting including changing the font family, font size, font weight and adding newlines. The data comes from a list of objects taht I want to serialise as RTF, not from a text control on the screen.
WordPad outputs fairly clean RTF and some other text editors do as well. If that's not enough, you can download the RTF Specification 1.9.1 although like any specification that's probably overkill for what you're doing.
You can also use the SaveToStream method on the Document property of a RichEditBox from a Metro style app to share out as well.

Making PDF annotations with Quartz 2D

I am working on PDFs using Leaves. I'm unable to figure out how to make annotations. I haven't used Quartz 2D much and would like some direction
Adding write annotation support is hard.
Quartz 2D won't help you there.
You need to manually parse the PDF. (e.g. with NSScanner) and build up the XRef tree of all the PDF objects. Then you're writing a new trailer that replaces the /Page object and attaches all new annotation data. It's quite hard to get right, and the 2000 pages PDF reference is not very helpful on that. I worked the better part of the year for proper annotation support (Highlight, Underscore, Strikeout, Ink, Note, ...).
And when you want highlight annotations, you also want text selection (else the user would have to free-draw a highlight - not a nice experience.) Getting the correct frames for the text glyphs for all PDF font types is another level of horror; in PDF there's no notion of a word or a column. Just single glyphs. The rest is algorithms and guessing.
I even spoke with some Apple engineers how they did it [text selection, annotations], and they told me a three-person team worked about three years on their implementation.