PDF: standard format for highlights? - pdf

Many PDF viewers (for example apple's Preview) allow one to select text to highlight like a regular yellow marker highlighter. These highlights can then be saved into the PDF file and reopened in other viewer apps.
Is there a standard part of the de facto PDF specification that defines highlights?

PDF standard (section 3.6.2 of version 1.7, according to this post) deals with annotations, this is what PDF viewer use to save highlights

There are multiple ways to achieve highlights - you can use shapes (rectangles) with transparency or use the highlight annotation type (12.5.6.10 Text Markup Annotations).
See Adobe's PDF Spec.

Related

Highlighting text?

I've been searching for a while to know how Highlighting text (books on PDF, epub, mobi) works programmatically, what kind of code or tech stays behind this feature, yet I couldn't find anything. If you know, please share any hint here.
Thnx in advance
For PDFs you can highlight text by adding a highlight text markup annotation:
12.5.6.10 Text Markup Annotations
Text markup annotations shall appear as highlights, underlines, strikeouts (all PDF 1.3), or jagged (“squiggly”)
underlines (PDF 1.4) in the text of a document. When opened, they shall display a pop-up window containing
the text of the associated note. Table 179 shows the annotation dictionary entries specific to these types of
annotations.
(ISO 32000-1)
You can find details in Table 179. For a specification of annotations in general, read the earlier subsections of section 12.5 Annotations.
Alternatively you can also add the highlighting into the page content. This has the disadvantage that others cannot easily change the highlight (which actually might be an advantage in some use cases). Depending on the PDF viewers to support you might be forced to do this, though.

Embed TrueType fonts in existing PDF

I know Acrobat won't do it because of the licensing restrictions, etc.
Does anyone know any program that will just embed True Type fonts anyway in an existing PDF (or print with embedded fonts to a new PDF, etc.)?
To expound mark's answer:
The sample EmbedFontPostFacto.java (EmbedFontPostFacto.cs) from chapter 16 of iText in Action — 2nd Edition shows how you can embed a given True Type font in an existing PDF.
Be aware, though, that certain assumptions are made here, it's just an example after all. When generalizing the code for generic PDFs and fonts, the font dictionary should be checked more thoroughly and embedding the font file can require slightly different entries to be changed. In that case let the specification ISO 32000-1:2008 (especially chapter 9 Text) be your guide.
I believe Itext allows you to manipulate the fonts in a PDF file

Extract font and its corresponding cmap in PDF

I am tried several ways to extract font from pdf viz. fontforge, mupdf, pdfparser in C# and also some pythone script. But am just confusing about get exact pair of a font and its cmap embeded in pdf. Please direct me the right approach by which i will get exact pairs of fonts and its cmaps.
As mentioned in my first comment, that should be easy using iText or iTextSharp or any other such library that allows you to access low-level PDF objects.
In case of iText(Sharp), ListUsedFonts.java and ListUsedFonts.cs can present starting points for you; they inspect all the font dictionaries in a PDF file accessible via at least one page. Instead of the simple output of those examples, simply export all the information you need. For this, ISO 32000-1:2008 should be your reference guide.

how can I generate a summary of the highlights in a pdf file?

We all know that we can highlight certain texts in a pdf file either using Adobe Acrobat or Preview on Mac. I'm wondering how I can extract all these highlights in a pdf file, and generate a summary (a note kind of thing).
The following post
PDF: standard format for highlights?
points out that there are multiple ways to do highlighting. Will it be a challenge to distinguish the original content of the file and the user-added highlights if shapes with transparency is used to achieve highlights?
Details about this can be found in open source pdf parsing-rendering libraries, and you just have to read the code or document if available.

Hide text in PDF when printing

Is it possible to write some text in a PDF (more specifically a link), that will not be printed when sent to a printer, but only be shown in a screen reader?
If it's possible then any pointers to PDF writing .NET libraries that might have this as a feature is very welcome.
There are at least two ways to achieve such an effect with PDF:
put the link or any element you do not want to print on a separate layer (in PDF spec lingo: "optional content") and set this layer as non-printable;
put the link into the PDF as an "annotation" and set this annotation as non-printable.