How to create a PDF document with header from a template with docx4j? - docx4j

I want to create a document from an existing Word 2010 document and convert it to PDF using docx4j 3.1.0. I've built upon the sample in
https://github.com/plutext/docx4j/blob/master/src/samples/docx4j/org/docx4j/samples/ConvertOutPDF.java
The Word document already contains a header with text and an image that I do not modify in my processing. The resulting PDF document, however, doesn't contain the header.
Is this someting that is supposed to work? If yes: how can I find out what I am missing?

Yes, if you can see the header when you "save as PDF" in Word, then you should also see the header in docx4j's PDF output.
To have it fixed, we'll need to see the docx.

Just for the curious reader: the specific cause for the missing header turned out to be a wrong approach of setting page margins on the document. Instead of modifiying the existing settings via body.getSectPr().getPgMar() (or even simpler: setting it in the template right away), the code created new PageDimensions and set a new SectPtr on the body, thereby somehow overwriting or removing the header.

Related

Creating dynamic PDF internal links with BIRT

I am trying to create a PDF in BIRT and I need to have bookmarks linking from a summary page to each detail page. The links work fine in the HTML preview and a similar http link works in published PDFs. However, the internal links do not work in the PDF format.
What I have tried so far is setting the bookmark property to "detail_" + row["nodeid"] and setting the hyperlink to the same. As stated, this works for the HTML preview, but not the PDF export.
The PDF has automatically generated TOC items that I would prefer to leverage off, but I don't know how to link to those.
Is there a way that I can get the PDF output to contain the required links using either bookmark properties, or the generated TOC items?
Sample PDF output (Customer data removed, alternate locations selected)
The solution to the problem lies not in the format of the bookmark/hyperlink, but in the placement of the bookmark.
The problem was, I was placing the bookmark on the row of the table I wanted to link to. Instead, the bookmark needed to be on the label in the first column of the row.
I believe the issue is that, in the HTML version, the table row is a <tr> tag, however in the PDF, the row doesn't physically exist, so there's nothing to set the bookmark on. However the label/text item exists in both versions, so the bookmark is created correctly.

Page Templates with Form XObject in PDF

I'm writing a PDF generation library and wanted to add the the ability to use other PDFs as templates. The specification notes a TemplateInstantiatedproperty on pages with the alias of the template object should be all that is needed.
Here is a gist of the pdf content:
https://gist.github.com/tyre/89c12f8203181f078001
The template itself is stored in object 16 and the page in object 19.
qpdf --check reports the PDF as invalid:
WARNING: tmp/alpaca.pdf: file is damaged
WARNING: tmp/alpaca.pdf (file position 32089): xref not found
WARNING: tmp/alpaca.pdf: Attempting to reconstruct cross-reference table
checking tmp/alpaca.pdf
PDF Version: 1.7
File is not encrypted
File is not linearized
I'm afraid your PDF document is completely and utterly broken and that you have misunderstood a number of key concepts. You cannot simply incorporate a complete PDF file into another PDF file in the way you have done and expect that to work.
The template system you are referring to is intended to include "hidden" pages - not referenced in the pages tree in the PDF file - in the context of an interactive form document (or interactive document in general). That doesn't sound like what you are intending to do. And these pages need to be valid PDF pages. You can in other words not just include the original PDF document verbatim and expect the PDF reader to sort things out; you need to insert a syntactically correct PDF page object.
What you want to do is take the content of a document and apply that as a background to a document. This most commonly is done using XObjects. Pseudo-code for this could be:
Open the original PDF document
Open the "template" document
Read the template document and copy all elements from the template page into a newly created XObject in the original PDF document.
Modify the page contents of the pages in the original PDF document to paint the new XObject at the beginning of the page description of the existing pages.
It's important to note that again, you're not supposed to simply insert the template document into the stream for the newly created XObject. You will have to create a valid XObject that contains a properly formed resources dictionary referencing all resources needed by your XObject, and that contains the content stream from your template document.
As already indicated in comments, the PDF presented by the OP is structurally defect, the cross reference table position and entries are wrong. Furthermore the transition from one PDF revision to a next update looks questionable. Essentially, therefore, the OP will have to provide a sample PDF which is at least syntactically correct.
That been said, though, the OP indicated he was
writing a PDF generation library and wanted to add the the ability to use other PDFs as templates. The specification notes a TemplateInstantiatedproperty on pages with the alias of the template object should be all that is needed.
The Named Pages mechanism is not meant for something like that. Its main current use (if it is used at all) is in the context of spawning page templates by Acroform actions.
For using pages from other PDFs, one can simply copy them (and the referenced other objects) from the source PDF if they are to be used as separate pages as is; and if multiple templates are to be put onto a single target page, one can wrap the copied sources into form xobjects and include them in the target page.

Is there any file pdf version which allow for automatic(or manual) addition of http source of document?

Is there any pdf version which allow for automatic(or manual) addition of http source of document ?
Scenarion of this problem from user side looks like that :
I found disire document in pdf format on web.
I save it.
In a few months I open this document and I wish to find the web page where I've found it.
It would be nice to have somewhere address of that file, of course it could be manually written in soe text file, but usually there are problems with copy+paste of pdf documents titles.
If you can modify your PDF files before sending them to the browser, then there are several places where you could put the URL where the document came from:
You could use a node in the "logical structure" tree (chapter 14 part 7 of the PDF reference document). This tree will show up in Acrobat Reader in the "Model Tree" tab.
You could add a hyperlink annotation to the top or bottom of each page, or the first page, or in a new page that you can add at the beginning or at the end of the file. I personally think this is the best approach since the link will be click-able.
You could add a button field on a page that fires a GoTo action that is linked to the source URL. Actions are explained in chapter 12 - Interactive Features of the PDF reference document.
You could add a bookmark(outline) that points to a named destination that is linked to the source URL. Named Destinations are also explained in chapter 12. This approach can also be used with just one click, and it is possible to hide the bookmarks tab if we will not use it.
You could add it as a Document property as #Bobrovsky said.
PDF allows you to add custom values to document information dictionary (see 14.3.3, "Document Information Dictionary" in PDF Reference). You might put your URL there. Adobe Reader will show custom values in Document Properties dialog on the Advanced tab.
Starting from PDF 1.4 (Acrobat 5x and later) you might add URL to XMP Metadata stream referenced from document catalog (see 14.3 Metadata in PDF Reference). Adobe Reader will show metadata properties too if you put them in Custom scheme.
Acrobat Professional could be used to add custom values or XMP metadata. Almost any PDF library that can open and save PDFs could be used for the task too.
I think there is no other places in a PDF document that you can use to store your information.
PDF Reference

RDLC rendered to PDF ignores Strikethrough formatting

So, I have a local .rdlc file with some text formatted using strikethrough formatting. My issue is quite simple to explain, but I do not know if it is just a limitation of PDF, or a bug with the .rdlc exporting to PDF.
When I write this code:
var localReport = new LocalReport();
...
byte[] pdf = localReport.Render("PDF");
System.IO.File.WriteAllBytes("MyReport.pdf", pdf);
None of the strike-through formatted text transfers over the the .pdf file properly.
If instead, I export to Word using .Render("Word"), the strikethrough does work on the .doc format. So, I know it isn't a problem with the .rdlc report itself.
Has anyone encountered this? Any solutions or workarounds?
I found this: http://social.msdn.microsoft.com/Forums/en-US/sqlreportingservices/thread/b35ca474-046d-4a38-a765-6c38c3d33105/
which suggests that missing strikethrough in PDFs was a known limitation. (But as mentioned in comments to the question, I couldn't reproduce with 2008r2.)
The two workarounds given there look painful.
(A) finding a font which itself as the strikethrough built into each
glyph/character. (B) trying to mimic a strikethrough using a line
report item. Note that for (B) overlapping items are supported only in
PDF, Print & TIFF formats.
I suppose if it were mine, I would play around with option B if the text is a small amount. Also, it may be worth test some of the html passthrough enabled when a placeholder is set to render as HTML. Maybe using a strikethrough style there would work?
While exporting RDLC report on word, I faced this issue. So while fetching data I replaced style for Strike formatting with strike tag from HTML and it worked.

is it possible to hyperlink to external pdf selection?

I'd like to create hyperlink references to a text selection or offset in an external PDF document, as if there were an anchor defined. E.g., http://lib.extern.org/doc1.pdf?page=3&paragraph=4 so that when the user follows the link in their browser, the PDF document opens positioned at the offset specified. I'm looking for any granularity, e.g. page, paragraph, line, word, character or even pixel em or inch offset would be acceptable. If a range for a selection could be specified that would be ideal, the purpose being to highlight and link directly to quotes in external PDF docs (to which the app has read-only access.) Seems so basic, but I haven't found a solution. Ideas?
It appears it works for page numbers
EDIT: And also check this (it should open on page 8)
There are no standards for commandline parameters for the plugins.
But if you can render a fresh PDF each time (make a copy and put in a new object via some PDF manipulation API), you can include an OpenAction that jumps to the page in question. You can even set more viewer parameters (or do some other personalization, watermarks, whatever...).