How to generate pdfmark which opens at specific paragraph? - pdf

We have the requirement to open a PDF user manual from our application at a context sensitive position. Since the manual is authored in MS-Word (2013), I managed so far to use a PRINT field and set a pdfmark:
PRINT "[/Dest /MyReferenceName /View [ /XYZ null null null ] /DEST pdfmark" \* MERGEFORMAT
When generating the PDF afterwards with e.g. FreePDF, I am able to open the document by adding the following to the Acrobat call:
/A "nameddest=MyReferenceName"
Only drawback is that the document always opens at the top of the page, not at the line where I inserted the PRINT field. So my question is
How can I create a pdfmark which then opens at a specific paragraph?
Is there an easier solution than what I did to solve the requirement under the given conditions?

Related

can I create a PDF from ImageMagick which will always open at zoom 100%? [duplicate]

I am running into an issue with PNG to PDF conversion.
Actually I have big PNG files not in size but in contents.
In PDF conversion it creates a big PDF files. I don't have any issue with its quality, but whenever I try to open this PDF in PDF viewer, it opens in "Fit to Page" mode.
So, I can't see the created PDF in the initial view, but I need to zoom it up to 100%.
My question is: can I create a PDF which will always open at zoom 100% ?
You can possibly achieve what you want with the help of Ghostscript.
Ghostscript supports to insert PostScript snippets into its command line parameters via -c "...[PostScript code here]...".
PostScript has a special operator called pdfmark. This operator is not understood by most PostScript interpreters, but is understood by Acrobat Distiller and (for most of its parameters) also by Ghostscript when generating PDFs.
So you could try to insert
-c "[ /PageMode /UseNone /Page 1 /View [/XYZ null null 1] \
/PageLayout /SinglePage /DOCVIEW pdfmark"
into a PDF->PDF conversion Ghostscript command line.
Please take note about various basic things concerning this snippet:
The contents of the command line snippet appears to be 'unbalanced' regarding the [ and ] operators/keywords. But it is not! The initial [ is balanced by the final pdfmark keyword. (Don't ask -- I did not define this syntax...)
The 'inner' [ ... ] brackets delimit an array representing the page /View settings you desire.
Not all PDF viewers do respect the view settings embedded in the PDF file (Acrobat software does!).
Most PDF viewers allow users to override the view settings embedded in PDF files (Acrobat software also does this). That is, you can tell your viewer to never respect any settings from the PDF files it opens, but f.e. to always open it with "fit to width".
Some specific things about this snippet:
The page mode /UseNone means: the document displays without bookmarks or thumbnails. It could be replaced by
/UseOutlines (to display bookmarks also, not just the pages)
/UseThumbs (to display thumbnail images of the pages, not just the pages
/FullScreen (to open document in full screen mode)
The array for the view mode constructed as [/XYZ <left> <top> <zoom>] means: The zoom factor is 1 (=100%), the left distance from the page origin is the special 'null' value, which means to keep the previously user-set value; the top distance from the page origin is also 'null'. This array could be replaced by
/Fit (to adapt the page to the current window size)
/FitB (to adapt the visible page content to the current window size)
/FitH <top>' (to adapt the page width to the current window width);` indicates the required distance from page origin to upper edge of window.
...plus several others I cannot remember right now.
So to change the settings of an existing PDF file, you could do the following:
gs \
-o out.pdf \
-sDEVICE=pdfwrite \
-c "[ /PageMode /UseNone /Page 1 /View [ /XYZ null null 1 ] " \
-c " /PageLayout /SinglePage /DOCVIEW pdfmark" \
-f in.pdf
To check if the Ghostscript command worked, open the PDF in a text editor which is capable of handling binary files. Search for the /View or the /PageMode keywords and check if they are there, inserted as values into the PDF root object.
If it worked, check if your PDF viewer honors the settings. If it doesn't honor them, see if there is an overriding setting within the viewers preference settings.
I did a quick test run on a sample PDF of mine. Here is how the PDF root object's dictionary looks now, checked with the help of pdf-parser.py:
pdf-parser-beta.py -s Catalog a.pdf
obj 1 0
Type: /Catalog
Referencing: 3 0 R, 9 0 R
<<
/Type /Catalog
/Pages 3 0 R
/PageMode /UseNone
/Page 1
/View [/XYZ null null 1]
/PageLayout /SinglePage
/Metadata 9 0 R
>>
To learn more about the pdfmark operator, google for 'pdfmark reference filetype:pdf'. You should be able to find it on the Adobe website and elsewhere:
https://www.google.de/search?q=pdfmark%20reference%20filetype%3Apdf&oq=pdfmark%20reference%20filetype%3Apdf
In order to let ImageMagick create a PDF as you want it, you may be able to hack the file defining your delegate settings. For more help about this topic see for example here:
http://www.imagemagick.org/Usage/files/#delegates
PDF specification supports this functionality in this way: create a GoTo action that goes to first page and sets the zoom level to 100% and then set the action as the document open action.
How exactly you implement it in real life depends very much on the tool you use to create the PDF file. I do not know if ImageMagick can create such actions.

Set PDF Document Properties Initial View

I am try to do set the Document Properties - > Initial View after creating PDF file from Post Script File.
Input PDF Initial View:
After my setting the Initial of the PDF file will as below:
This only I am trying to do by automation.
With the help of pdfmark Reference (Adobe SDK) I am tried to set DOCVIEW in post scrip
Post Script:
[ /PageMode /UseOutlines /Page 1 /View [/Fit] /DOCVIEW pdfmark
The above script will set
Document Properties - > Intial View
Document Option
Show: Bookmarks Panel and Page
Page Layout: Default
Magnification: Fit Page
Window Options
Show : File Name
But my expectation is as shown below,
Document Properties - > Intial View
Document Option
Show: Bookmarks Panel and Page
Page Layout: Continuous
Magnification: Fit Width
Window Options
List item
Show : Document Title
So please give me advice.
What is the post script for those things.
else any other way to set those by automation.
Regards
Thirusanguraja
I am got the solution for my question, by the postscript only i am done this job
Postscript Script:
[ /PageMode /UseOutlines /View [/FitH -32768]/Page 1 /DOCVIEW pdfmark
[ {Catalog} <</ViewerPreferences<</DisplayDocTitle true>>/PageLayout/OneColumn>> /PUT pdfmark
Above postscript code was i am used to get my PDF Document Properties - > Initial View as
Document Option
Show: Bookmarks Panel and Page
Page Layout: Continuous
Magnification: Fit Width
Window Options
Show : Document Title
I hope my statement is true.
Regards,
Thirusanguraja Venkatesan
There do not appear to be pdfmarks definitions to cover what you want (or at least I can't find them in the current pdfmark specification. That being the case you cannot create a PDF with those properties via pdfmark.
One of the features you want seems to be covered by the /DisplayDocTitle entry in a ViewerPreferences dictionary, but the rest I can't see anything obvious in the spec. I'm not convinced these are under control of the PDF but if they are, your best bet is to inspect one that does what you want.

Export PDF page labels on command line

I'd like to export the page-labels stored in some PDF documents for easy parsing. I know I could dig into the PDF document after having it converted with qpdf, but this seems like overkill.
Is there no commandline tool that will simply print the page label for each page (or together with other meta-data)? I know that PDFSpy will export the label, but $300 isn't an option, preferably the solution should be free.
Short answer:
I am not aware of any (free) tool that can 'simply print' the page label for each page.
Also, you'll not be able to evade the expansion compressed objects and object streams, using a tool like qpdf or one with equivalent capabilities.
Long answer:
There's no such tool because these are the only a few things you can safely rely on when it comes to page labels. These are the following:
Each PDF document must contain a root object.
That root object must be of /Type /Catalog.
The document's trailer will show where to find the object using the key /Root followed by the indirect object number reference.
IF a PDF document uses non-standard page labels, then the document root object must have an entry named /PageLabels.
Here is where it stops to be relatively easy. Because the object the /PageLabels key refers to may be contained in a compressed object stream. This means that you'd have to expand that object stream.
If you really succeeded to get the description of the page labels as ASCII, you'll discover that it's not an easily parseable flat list (like a dictionary is): it is a number tree.
I'll not go into the details of these complexities, because it would take a very long article to describe all possible variations. You better read it up directly in the official ISO PDF-1.7 specification.
But instead I'll give you an example in ASCII PDF code:
213 0 obj
<< /Type /Catalog
/PageLabels
<<
/Nums
[
0 << % start labeling from page no. 1
/S /r % label with lowercase roman numbers
>>
7 << % start new labeling from page no. 8
/S /D % label with standard decimal numbers
>>
11 << % start labeling page no. 12
/S /D % label with decimal numbers...
/P (ABCD-) % ...but using label prefix 'ABCD-'...
/St 3 % ...followed by '3' as the start decimal.
>>
]
>>
%%...........................
%%...more root object keys...
%%...........................
>>
endobj
The above example will label the pages number 1, 2, 3, ... (last) like this:
i
ii
iii
iv
v
vi
1
2
3
4
ABCD-3
ABCD-4
ABCD-5
ABCD-6
...and so on until last page...
As you can see, the PDF method of labeling pages (mapping page numbers to page names) is completely non-intuitive. You can only understand it by studying the PDF specification.
I've written a small command-line utility based on Poppler that does just this task: https://github.com/HeimMatthias/pdfpagelabels
Disclaimer: I'm the OP and created the original post under a different account. I have been using the solution via pdftk (listed in a comment above) successfully for years in my implementation. However, last year it was time to reimplement our system from scratch and we've had numerous instances where the pdf-tk output could not be parsed by our implementation.
The new command-line tool follows the philosophy of doing just one thing, but doing it well, and simply prints the page labels of all or selected pages of a pdf-file. If anyone finds this useful, and stumbles upon it here, all the better for it.

Is there an "Export to Pdf" plugin for Tiddlywiki?

Has anyone put together a plugin or tool for exporting a Tiddlywiki to pdf?
No, there isn't.
As a workaround, I write or find a decent printable stylesheet, then print to PDF.
Why not select the target tiddler to "Open in new window", and print it to PDF with any installed PDF printer?
To accomplish this I used a tool to convert HTML to PDF. These steps are a bit long but well worth it. Once you've got it working it is easily repeated.
In each tiddler that I want in my PDF, I mark with a specific tag; I used TableOfContents.
In each tiddler that is marked with this tag, I added an order field--to be used to define the order of tiddlers to appear in the PDF.
Ensure your HTML headers are properly defined for the document. I think tiddler titles use <h2>, so properly defining subheadings using <h3><h4> etc will ensure, if you want, a nice auto-generated Table of Contents in your PDF.
If you want each tiddler to start on a new page (in the PDF), we need to add this HTML to the end of each tiddler:
<div style = "display:block; clear:both; page-break-after:always;"></div>
With a completed TiddlyWiki document export the tiddlers to a single HTML file--this will be used to generate a PDF document. To export, go to the AdvancedSearch, select the Filter tab. In the search textbox enter your filter criteria--for me that was:
[tag[TableOfContents]sort[order]]
You'll see, immediately, on-screen a list of the tiddlers the system found based on that criteria. Then click on the Export icon and select Static HTML.
Optionally, but I think it's a great idea, manually create a cover page (in your favorite editor)--this will be a single HTML file to act as the cover page in the PDF document; call it cover.html. More on this later.
Download and install wkhtmltopdf (command-line tool to generate PDF from an HTML file).
https://wkhtmltopdf.org/downloads.html
Learn and get familiar with the wkhtmltopdf command line syntax. There are numerous features here so the command you end up with maybe lengthy. Use wkhtmltopdf /? to view general help, then wkhtmltopdf --extended-help to view details (well worth the read).
Generate a PDF document. At the command prompt navigate to the folder where your TiddlyWiki document is located. Here is a list of my favorite command-line switches. My app is installed in C:\Program Files..., so my command line starts with that...
"c:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe"
Add this switch for a header on the left:
--header-left "My document title"
For a header on the right:
--header-right "v1.0.0.1"
Font size of header:
--header-font-size 8
Display a line below the header:
--header-line
Spacing between header and content in mm (default 0):
--header-spacing 5
A left-footer ([section] is replaced with the name of the current section:
--footer-left "[section]"
A centered footer:
--footer-center "Page [page] of [topage]"
Footer font size:
--footer-font-size 8
Footer spacing:
--footer-spacing 5
If you want titles to hyperlink (in the PDF) to go back to the TOC:
--enable-toc-back-links
Make sure no background images get printed:
--no-background
I added special styles in the TiddlyWiki document for print media--to hide tags and clean up the spacing. Then I used this switch to ensure print media is used:
--print-media-type
Being in North America I want letter-size pages; I think the default is A4:
-s Letter
IMPORTANT--give the tool access to local files, otherwise your images will be missing in the PDF:
--enable-local-file-access
Use this if you want to have a cover page (see step 6 above):
cover "cover.htm"
And use this if you want a TOC automatically generated. Without a cover page, the TOC will be your first page, so create a cover page:
toc
After the toc identify your exported tiddler HTML file as input to the tool:
tiddlers.html
And, the final argument on the command line is the output PDF file name:
MyDocument.pdf
Export the tid to html.
Then in the terminal, issue:
html2pdf $myTid.html $myTid.pdf
$myTid is only a var and can be any name
:)

Programmatically add comments to PDF header

Has anyone had any success with adding additional information to a PDF file?
We have an electronic medical record system which produces medical documents for our users. In the past, those documents have been Print-To-File (.prn) files which we have fed to a system that displayed them as part of an enterprise medical record.
Now the hospital's enterprise medical record vendor wants to receive the documents as PDF, but still wants all of the same information stored in the header.
Honestly, we can't figure out how to put information into a PDF file that doesn't break the PDF file.
Here is the start of one of our PDFs...
%PDF-1.4
%âãÏÓ
6 0 obj
<<
/Type /XObject
/Subtype /Image
/BitsPerComponent 8
/Width 854
/Height 130
/ColorSpace /DeviceRGB
/Filter /DCTDecode
/Length 17734>>
stream
In our PRN files, we would insert information like this:
%MRN% TEST000001
%ACCT% TEST0000000000001
%DATE% 01/01/2009^16:44
%DOC_TYPE% Clinical
%DOC_NUM% 192837475
%DOC_VER% 1
My question is, can I insert this information into a PDF in a manner which allows the document server to perform post-processing, yet is NOT visible to the doctor who views the PDF?
Thank you,
David Walker
Yes, you can. Any line in a PDF file that starts with a percent sign is a comment and as such ignored (the first two lines of the PDF actually are comments as well). So you can pretty much insert your information into the PDF as you did into the PRN.
However:
The PDF format works with byte position references, so if you insert data into a finished PDF file, this will push the rest of the data away from their original position and thus break the file. You can also not append it to the file, because a PDF file has to end with
startxref
123456
%%EOF
(the 123456 is an example). You could insert your data right before these three lines. The byte position of the "startxref" part is never referenced anywhere, so you won't break anything if you push this final part towards the end.
Edit: This of course assumes there is no checksumming, signing or encryption going on. That would make things more complicated.
Edit 2: As Javier pointed out correctly, you can also just add your data to the end and just add a copy of the three lines to the end of that. Boils down to the same thing, but it's a little easier.
PDFs are supposed to have multiple versions just appending at the end; but the very end must have the offset to the main reference table. Just read the last three lines, append your data and reattach the original ending.
You can either remove the original ending or let it there. PDF readers will just go to the end and use the second-to-last line to find the reference table.
Have you ever thought to embed your additional info inside the PDF as a separate file?
The generic PDF specification allows to "attach files" to PDFs. Attached files can be anything: *.txt, *.doc, *.xsl, *.html or even .pdf. Attached files are contained in the PDF "container" file without corrupting the container's own content. (Special-purpose PDF specifications such as PDF/A- and PDF/X-* may impose some restrictions about embedded/attached files.)
That allows you to tie additional info and/or data to PDF files and allow for common storage and processing. Attached files are supposed to not disturb any PDF viewer's rendering.
I've used that feature frequently, for various purposes:
store the parent document (like .doc) inside the .pdf from which the .pdf was created in the first place;
tag a job ticketing information to a printfile that is sent to the printshop;
etc.pp.
Of course, recently discovered and published flaws in PDF processing software (and in the PDF spec itself) suggest to stay away from embedding/attaching binary files to PDF files --
because more and more Readers will by default stop you from easily extracting/detaching the embedded/attached files.
However, there is no reason why you shouldn't be able to put your additional info into a medical-record-info.txt file of arbitrary lenght and internal format and attach it to the PDF:
MRN TEST000001
ACCT TEST0000000000001
DATE 2009-01-01
TIME 16:44:33.76
DOC_TYPE Clinical
DOC_NUM 192837475
DOC_VER 1
MORE_INFO blah blah
Hi, guys,
can you please process this file faster than usual? If you don't,
someone will be dying.
Seriously, David.
FWIW, the commandline tools pdftk.exe (Windows) and pdftk (Linux) are able to attach and detach embedded files from their container PDF. Acrobat Reader can also handle attachments.
You could setup/program/script your document server handling the PDF to automatically detach the embedded .txt file and trigger actions according to its content.
Of course, the doctor who views the PDF would be able to see there is a file attachment in the PDF. But it wouldn't appear in his "normal" viewing. He'd have to take specific additional actions in order to extract and view it. (And then there is the option to set a password on the PDF to protect it from un-authorized file detachments. And/or encode, obscure, rot13 the .txt. Not exactly rock-solid methods, but 99% of doctors wouldn't be able to accomplish it even if you teach them how to...)
You can still insert comments into a PDF file using the % character. But anyone would be able to access with a text editor.
Your vendor could remove these comments after post-processing, so it doesn't actually get to the doctors.
You can store the data as real PDF metadata. For example, with CAM::PDF you can write metadata like this:
use CAM::PDF;
my $pdf = CAM::PDF->new('temp.pdf') || die;
my $info = $pdf->getValue($pdf->{trailer}->{Info}) || die;
$info->{PRN} = CAM::PDF::Node->new('dictionary', {
DOC_TYPE => CAM::PDF::Node->new('string', 'Clinical'),
DOC_NUM => CAM::PDF::Node->new('number', 192837475),
DOC_VER => CAM::PDF::Node->new('number', 1),
});
$pdf->cleanoutput('out.pdf');
The Info node of the PDF then looks like this:
8 0 obj
<< /CreationDate (D:20080916083455-04'00')
/ModDate (D:20080916083729-04'00')
/PRN << /DOC_NUM 192837475 /DOC_TYPE (Clinical) /DOC_VER 1 >> >>
endobj
You can read the PRN data back out like so (simplistic code...)
my $pdf = CAM::PDF->new('out.pdf') || die;
my $info = $pdf->getValue($pdf->{trailer}->{Info}) || die;
my $prn = $info->{PRN};
if ($prn) {
my $prndict = $pdf->getValue($prn);
for my $key (sort keys %{$prndict}) {
print "$key = ", $pdf->getValue($prndict->{$key}), "\n";
}
}
Which makes output like this:
DOC_NUM = 192837475
DOC_TYPE = Clinical
DOC_VER = 1
PDF supports arbitrarily nested arrays, dictionaries and references so just about any data can be represented. For example, I built an entire filesystem embedded in a PDF just for fun!
At one point we were changing some Acrobat JS code by doing a text replace in a plain (unencrypted) PDF. The trick was that the lengths of each PDF block were hard coded in the document. So, we could not change the number of characters. We would just add extra spaces.
It worked great, the JS code executed an all.
Have you thought about using XMP?