I have a PDF document to which I want to add internal hyperlinks.
Specifically, page 1 contains a table of contents which I want to make clickable.
My idea is to create rectangular boxes in predetermined locations on page 1, which should link to pages 2, 3, ...
I found this post which talks about adding internal hyperlinks using the method I described above.
http://bugs.ghostscript.com/show_bug.cgi?id=691531
However, when I try to use this technique in my file, the script just ADDS pages with the rectangle and hyperlink.
I need it to overlay the hyperlink on the existing contents of my first page.
You can do this with Ghostscript, using the pdfmark operator.
For some introduction to the pdfmark topic, see also Thomas Merz's PDFmark Primer.
For an example to achieve a similar thing, see this answer: Merge PDF's with PDFTK with Bookmarks?
Alternatively, you could...
...use qpdf to expand all (compressed) internal PDFstreams into ASCII,
...edit the PDF source code (using the knowhow acquired from the PDFmark Primer),
...use qpdf again to re-compress the PDF streams.
This is what I used:
Ghostscript function call from MATLAB:
-o output.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress original.pdf script.ps
Postscript code saved in script.ps:
[ /Rect [10 10 50 50]
/Page 2
/SrcPg 1
/Subtype /Link
/ANN pdfmark
There is currently (as of 2020) a piece of freeware for Windows that allows adding hyperlinks. PDF X-Change Editor, which has a free demo version, allows manually drawing hyperlinks on the page (arbitrary rectangles) and setting the target location (page). It is offered at no cost but it is not "free as in libre" software.
Related
I know from some colleagues, who are desiging our leaflets in Indesign and store as PDF's that there is a setting to view it in full page mode, when opening the file.
I did a script to "merge" some of these docs using ghostscript device -pdfwriter and option -dPDFFitPage (edited after KenS' answer)
here my full command:
gs -dBATCH -sDEVICE=pdfwrite -dNO_PDFMARK_OUTLINES -dPDFFitPage -o output.pdf cover.pdf input1.pdf input2.pdf input3.pdf pdfmarks
But "-dPDFFitPage" does not do what I am expecting. The pagewidth is fit on screen, but I would like the whole page to fit on screen. I also heard using "/FIT" in the pdfmarks would help but it also doesn't.
If anybody can help me, I would be very thankful.
Best regards
Mike
Ok, after some other hours of reading the pdfmark reference and asking google various questions, I came across my ultimate and satisfying solution:
[ /PageMode /UseOutlines /Page 1 /View [ /Fit] /PageLayout /SinglePage /DOCVIEW pdfmark
So I simply added /PageLayout /SinglePage and it opens in full page mode in the reader window, showing bookmarks (/UseOutlines) and when scrolling, it scrolls pagewise, so every step of the mouswheel is one page. This works perfectly now.
I found a solution to my problem, perhaps it will help other's so I am posting it as an answer. KenS' answer was a big help to solve my problem. Thanks to him.
[ /PageMode /UseOutlines
/Page 1 /View [/Fit]
/DOCVIEW pdfmark
This sets the magnification of the PDF file to "windows size". With Acrobat Reader and Acrobat standard, it works pretty well. Other readers are not tested.
Best regards
Mike
There is no option -dpdfwriter. The fact that PDFFitPage doesn't do what you want isn't surprising, it has no effect on what a PDF viewer will do. This option (which is described in the documentation) only has any effect when used with a pre-defined fixed media size. It creates a new PDF where the content of the original PDF is scaled so that fits onto the fixed media size.
If you want to include directions to PDF viewers on how to open PDF documents then you need to look at the pdfmark operator. Specifically you will need to construct a DOCVIEW pdfmark as described on page 29 and 30 of the version 9 pdfmark reference.
I am able to sucessfully create PDF files using PDFsharp and MigraDoc.
Two private fonts (OTF format) are used for the creation of a single page PDF. The created PDF contains both fonts fully embedded.
Unfortunatly each font contains Chinese letters too and therefor measures about 4 MB in size each resulting in a PDF file size about 9 MB (containing one page with a bit of text only!). :shock:
Is it possible to use a subset of those fonts to save valuable space.
The thing is I need to create a few thousends PDF files and therefor file size is crucial.
Is there a special setting i can use?
Can anyone point me into the right direction?
Update:
I used fontforge to extract the embedded font subgroub and found out that the fonts derived from the pdf match the full font files exactly.
So no font subsetting is indeed used at all. :(
Taking a look into the PDFsharp sources I found the function
public OpenTypeFontface CreateFontSubSet(Dictionary<int, object> glyphs, bool cidFont)
which is commented as follows: Creates a new font image that is a subset of this font image containing only the specified glyphs.
Which is exactly what I want to be used here.
The thing I do not understand is why this function seems not to get used when creating my PDF.
What criteria needs to be met in order to make it work?
Just found a solution to my problem that requires no extensive fiddling with additional pdf frameworks. I am able to create font subsets using ghostscript (commandline).
In fact ghostscript takes the (pdfsharp-) generated file and rewrites it (while optimizing the fonts). Here the commandline solution:
gswin64 -dCompatibilityLevel=1.4 -dPDFSETTINGS=/printer -dCompressFonts=true -dSubsetFonts=true -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=optimized.pdf -c ".setpdfwrite <</NeverEmbed [ ]>> setdistillerparams" -f my_pdfsharp.pdf
My file size of about 9 MB is now down to 51 KB. Yihaa!!!
Some fonts have a "loca table", some do not. The loca table stores the offsets to the locations of the glyphs in the font.
CreateFontSubSet is and can only be called for fonts with a loca table that provides the information needed to create subsets.
I have a dozen essays as PDFs which I want to combine to one concatenated master PDF with a table of content where each entry is a clickable link to the first page of each essay. The TOC could be either a page with internal links or a proper PDF TOC.
The best would be a command line solution on Linux and macOS. So far I have used QPDF, which works great for concatenating the essay PDFs, but it does not build a TOC.
It is a one-off problem, so I am happy to write some (bash, Python or other) scripting code to generate this TOC. For utility it is important that the links are clickable.
Any idea how to do this?
As I already noted, you can create TOC page manually and append/prepend it to the file.
To make TOC clickable, you need to add link annotations to it. After quick googling I made the following example using GhostScript:
gs -o output.pdf -sDEVICE=pdfwrite -dPDFSETTINGS=/prepress input.pdf an.txt
And an.txt file contains the following:
[ /Subtype /Link
/SrcPg 1
/Rect [10 10 50 50]
/Page 2
/ANN pdfmark
Here SrcPg is page number to put annotation on; Rect is the area to make clickable; Page is destination page number.
You can find more details on annotation syntax here and here. Hope it helps.
I have a Pdf file which contains several slides per page, including text (not only images).
This pdf was probably created using pdfnup.
Can I revert the pdfnup operation so that each slide is shown on one page?
As far as I know, there is no simple to be used 'undo' operation.
However, the following answers show you the approach principle, how you can achieve the undo-equivalent operation using Ghostscript:
Convert PDF 2 sides per page to 1 side per page (Superuser)
How can I split a PDF's pages down the middle? (Superuser)
Cropping a PDF using Ghostscript 9.01 (Stackoverflow)
PDF - Remove White Margins (Stackoverflow)
(Should these not help you to find the final solution, ask again. But then to come up with a fully working commandline, I'd need the complete output of the following command first: pdfinfo -f 1 -l 100 -box your.pdf.)
I am using
gs -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=book.pdf -f front-matter.pdf fulltext-0.pdf fulltext-1.pdf back-matter.pdf
to create a single PDF document from a series of pdf documents. I was going to include a new made-up table of content and include it using the pdfmark mechanism. Then I notice that the original files already have bookmarks in them - they are however referenced to the original page numbers, not the ones in the combined document.
I am looking for two possible solutions. Remove the orginal bookmarks or make use of the original bookmarks but somehow update their page references...
As so often the case, someone has walked the same path before you...
unfolding disasters has worked out a solution to this very problem. His python script pdf-merge.py first invokes pdftk with its dump_data switch to retrieve all the pdfmark information. It then keeps track of the total number of pages for each merged document and does the math to offset the new page number pointer in the pdfmark instruction by the sum total of page counts of all the PDF documents included before the current PDF document. So it is close but not the same as the 2-pass approach of KenS. It first discovers bookmarks using pdftk and then creates a new bookmark file with correct page numbers. It also manages to turn the original pdfmark instruction (that would normally be preserved by gs into noop). I won't pretend I understand how that last part worked ...
However, the script does all I need including the option of tweaking the bookmark file before the final writing. Very neat and hat tip to Trevor King.
In general pdfwrite doesn't know you are appending files, so it preserves bookmark and other 'metadata' information on the assumption that you will want it in the output.
However, when you are combining PDF files, preserving the information won't work, as the page numbers for the second and subsequent files will be incorrect.
So you need a 2-pass approach, first merge all the files, discarding the bookmarks, then 'convert' the merged file and add pdfmarks to set the correct bookmarks.
There is currently no option (with pdfwrite) to not preserve bookmarks. You will need to modify the Ghostscript PDF interpreter PostScript files to achieve this I think. You might try setting -dDOPDFMARKS=false, but I doubt that will work.