PDF - why is there no standard structure element for a page? - pdf

The PDF Spec defines standard structure types, used to define a structure tree for the document. As far as I can see, there is no element related to pages. Here are the standard structure types for grouping elements:
Document
Part
Art
Sect
Div
...and so on...
Why is there no Page item in this list?
If you want your structure to use pages, what should be used? Part? Sect? Div?

PDF tags exist so that the content type / meaning of elements can be identified. They should be considering a kind of "meta" information for the PDF, simply providing context for the content in a file (so that content can be easily extracted, converted, processed, accessible, etc.). Think of it as a table of contents to a book. Just because the book has x pages doesn't mean that the content structure would be altered if the book's page height was cut in half and now had 2x pages in it.
A Page Object in the PDF Document Structure already groups elements (by nature of each element being on a given page), so doing so in this structure would be a little redundant.
Also, consider this case:
Document
Table of Contents (Page 1)
Section 1 (starts on page 2, ends mid page 3)
Sub Section (page 2)
Sub Section (half of page 3)
Section 2 (starts mid page 3)
etc...
In this example, Section 1 and Section 2 couldn't both be direct parents of page 3 (not to mention that Section 1 spans two different pages). Additionally, trying to solve this problem really isn't necessary because the elements which is being grouped here is already each a child of its respective Document Structure's Page node in the actual file format.

Appendix G of the PDF Specification gives examples that demonstrate use of the Page object.

The PDF has a tree structure (which is what allows it to load any page so fast). The content does not have any structure unless you choose to use the marked content feature of the format which then allows metadata to be include in the data.

Related

How to get all Wikipedia page links with their pageIDs?

Starting a request like that:
https://en.wikipedia.org/w/api.php?action=query&format=json&titles=Title&prop=links&pllimit=500
provides me a list of links (that the page contains) where every link consists of the title and the ns (namespace)
Is there a way to also get the PageID together with title & ns? (the less work it is for the sever the better of course)
You need to use generator parameter. Here is an example for Cobra Wikipedia page.
https://en.wikipedia.org/w/api.php?action=query&generator=links&titles=Cobra&prop=info&gpllimit=500

Netsuite PDF Templating: get number of pages as attribute

I am templating pdfs in Netsuite using freemarker and I want to display the footer only on the last page. I have been doing some research, but couldn't find a solution (since looks like the environment does not allow me to include or import libs), so I thought that just comparing the number of the page with the total pages in an if tag would be a nice and easy workaround. I already know how to display the numbers by using the <pagenumber/> and <totalpages/> tags, but still cannot get them as values so I can use them like this:
<#if (pagenumber == totalpages) >
... footer html...
</#if>
Any ideas of how or where can I get those values from?
The approach you are trying won't work, because you are mixing BFO and Freemarker syntax. Netsuite uses two different "engines" to process PDF Templates. The first step is Freemarker, which merges the record fields with your template and produces an XML file, which is then converted by BFO into a PDF file. The <totalpages/> element is meaningless to Freemarker, as it is only converted into a number by BFO later.
Unfortunately, the ability to add a footer to only the last page of a document is currently a limitation of BFO, as per the BFO FAQ:
At the moment we do not have a facility for explicitly assigning a
footer or header to the last page in a document when the number of
pages is unknown.
You CAN add it after a page break - and put the page break at the end of the body
<pbr footer="nlfooter" footer-height="25%"></pbr>
</body>
The issue here is - on a one page output - you will get 2 pages minimum... it will always ADD a page for the disclaimer / footer...

Documentation with Diagram "Hyperlinks" in Enterprise Architect?

I'm struggling to get all the required (and only the required) information into the documentation of my Enterprise Architect Project. Precisely: we have modelled various requirements and displayed the source "standards" for these requirements in our diagrams by using the "hyperlink"-element out of the common toolbox. (This allows us to capture a title, the website where the documentation is found and a description of this documentation).
Now this element is visible on the diagram, but not in the package-view of our model and it does not get generated in our word (docx) documentation.
I can see that it should be possible to get this in the documentation, because a "Model Report" which basically prints everything does print the hyperlinks. But I can't find what I have to select in my template (in the package-tree view, as a package field, element field or diagram field) in order to get this printed. I can't just use the model report since this basically dumps the whole database in the document and reverse-engineering this model report has proven too difficult for me. Actually I would expect this to be in some kind of documentation for EA, but could not find such a thing with this level of detail... is there, is there a reproducible way of finding such things out in further cases? (btw I'm using EA 11.0)
[sorry there were illustrations here, but I'm not allowed to upload them...]
As Geert has already noted, there is a difference between "proper" elements and diagram-only elements. This is actually reflected in the document template editor, where there is an "Element" section inside the "Diagram" section. This will produce output for all elements in the diagram, whether or not they are also in the project browser.
Here's an example of the information you can pull out of your hyperlinks. Given a diagram with a hyperlink:
... and a template which outputs name, alias and hyperlink for each element in the diagram:
... EA will generate a document will the following contents:
So if you want the hyperlink to result in a hyperlink in the document, use the HyperlinkAlias field.
What might be a bit confusing is the fact that in addition to the Hyperlink element type in the Common diagram toolbox, EA allows you to create hyperlinks in regular elements (in the Element Properties dialog, Related tab: Files, which can be local files or web addresses).
In fact, I would recommend that you use those in your Requirement elements rather than diagram-only Hyperlinks if traceability is a priority in your model. The diagram-only Hyperlinks, on the other hand, give you a clearer visual.
Selecting a subset of the elements in a diagram ("only the required information") is a little more involved and depends on how your model is structured. Template fragments will get the job done, but you might be able to achieve your desired result by just using the filters in the document generation dialog.
The hyperlink is an element that is stored in the same package as the diagram it is used on, it is just not visible in the project browser (similar to a note element).
There's a good chance that it doesn't have a name, so make sure you don't omit nameless elements.
So if you print all the element of the package containing the diagram then you should be able to print the hyperlink as well.
In case that fails you might want to consider creating a template fragment based on an SQL query or a script. Those offer lots of flexibility to print whatever you need, even if it is located in a different package.
[Edited on 04.05.15 to reflect the comment by Uffe and provide a final solution]
Ok, based on Geerts answer, using the following custom query fragment in the diagram section:
select
t_object.ea_guid as CLASSGUID,
t_object.Object_Type as CLASSTYPE,
t_object.Object_Id as OBJECTID,
t_object.name as HL_Name,
t_object.Stereotype as HL_Stereotype,
t_object.object_type as HL_Type,
t_object.Alias as HL_Alias,
Note as Notes
--,t_object.*
from t_object
left join t_diagramobjects on (t_object.Object_ID = t_diagramobjects.Object_ID)
left join t_diagram on (t_diagram.Diagram_ID = t_diagramobjects.Diagram_ID)
where t_diagram.Diagram_ID = '#DIAGRAMID#'
and t_object.Object_Type='Text'
I was able to get a list of the hyperlinks following the diagram, this is the fragment:
custom >
{HL_Alias}: {HL_Name}
{Notes}
< custom
The "Notes" can be printed by getting the attribute directly out of the t_object table. Don't get confused as I was at first: the auto-completion on t_object and the results (t_object.*) DO NOT SHOW a Note-Attribute, but it does exist an when you write it into the query, it gets generated in the document.

How to detect a section in a wikimedia page dump

I have looked around quite a bit to try and answer this question, but to no avail. I am parsing wikimedia page dumps to process certain pages (yes, I am aware of several tools to parse wikimedia page dumps, but they don't work for me as well as my parser).
Question is simple. I know how to detect start of a section (e.g. "==External References=="). That's easy. What's not well defined is how to detect when a section ends? For example, for most sections I can scan until start of next section header, but that isn't reliable. I looked at wikimedia's help page on sections, but it doesn't say how to detect end of a section.
There is no "section end" marker in MediaWiki syntax. A section extends until the next section header of the same or lower level. (There is also a "section 0" containing all the text before the first section header.)
Yes, this implies that sections at different levels can overlap, as in this example:
This text is in section 0.
== Section 1 begins here ==
This text is in section 1.
=== Section 2 begins here ===
This text is in sections 1 and 2.
=== Section 3 begins here ===
This text is in sections 1 and 3.
== Section 4 begins here ==
This text is in section 4.
Note that headings created using the HTML <h1>, <h2>, etc. tags don't begin or end sections, and won't have section edit links, even though they look otherwise identical to section headings.
Section headings inside templates do get section edit links, which let you edit the corresponding section of the template, but they're treated specially and are not considered part of the normal section structure of the containing page. There are also some weird special cases here involving section headers inside template parameters which I don't fully remember off the top of my head.
The automatically generated first level heading at the top of every page also doesn't count as a section heading, although any extra first level headings created with = Heading = do.

Set the header for PDF document on second page

I have a jsp page that contains some data(This data may very because it is fetches from the data base) over it.Now I am generating the PDF file for the my jsp page using the ‘Itext api’ . Now I need to set the some specific header on this generated pages started from the Page 2 onwards.
For Ex :-
Test.jsp
Jsp test page contains the data for testing purpose.
Test.pdf
Jsp test page contains the data for testing purpose.
……
…
Page 1 ends
Page 2 starts
My title for the page..(Only needed for the second page onwards)**
Please help me...
Thanks in advance
You can probably make use of setSkipFirstHeader(boolean) of PdfPTable. Which will not print out the first occurrence of the header of your table. You will need to be sure to setHeaderRows on the table.
You could also subclass the PdfPageEventHelper and in the onEndPage event, use the passed in Document object to getPageNumber(). When you determine it is past the first page, you can add your header content to the document.