Manually add new page in PDFBox using pdfbox-layout - pdfbox

I'm using pdfbox-layout to create and manage PDF documents using Document API.
Document document = new Document();
It manages to create new page automatically if the text size increases beyond current page using Paragraph API.
Paragraph paragraph = new Paragraph();
However, I'm unable to add new page manually as and when needed. I want to print some content starting from new page.

After going through the API's source code, I found that add method accepts parameter of type Element,
document.add(Element element)
And after inspecting all the classes that implements Element interface, I found the one that I need, i.e. ControlElement
So, to add a new page, my code looks like,
document.add(ControlElement.NEWPAGE);

Related

Decide if I need a new PDPage using PDFBox

I understand that PDFBox offers only a very low-level API for PDF generation, and I need to manually decide if I need a new PDPage.
I'm trying to write a simple logic to create a new PDPage when the current page is full.
And The only option that I can think of is to see if my Y pointer of the page is about to cross the page height. But this needs to be done everytime I try to add anything to PDPageContentStream .
Is there a better approach than this?

Link for new template for register

I am trying to create a second registration form (with slightly different fields). I have copied the template of customer/register.liquid (say customer/register.xyx.liquid). What will be the URL to access this new page? i.e. what should be the link instead of the original link /account/register?
Any new template can be accessed front end by using "?view=" in the webpage URL
For your case try /account/register?view=xyz

iText header for html

I am generating a PDF using itext 5.0.5.I am reading different mime types image,pdf,html content etc. and then reading those files from database and generating pdf.
There are two type of document user can view a individual document and a collection of documents in one single generated pdf.
I HAVE ONE PROBLEM WITH HTML content pdf header part.This html content is coming from a text area on a form,there a user will get the header information prepoulated in text area then he can type and create document.At the time of pdf generation if i am using page event to generate the header for each page for every mime type document.
For html content the header is coming two times.What i want to do is for html type document on first page header should not be generated for first page.I got the solution for pdf if i am reading the individual document but when i am reading the final pdf which is containing all documents of different mime types then it's not working.Is there any way so that i can do like header will not be generated for html type content's first page for rest of pages it will be generated using page event.
please help.
Perhaps you could use two different pageEvents when dealing with HTML. One that added headers (the current one), and one that set the page event handler to the original one.
You start off with the new one. The first page event comes along, and that new event handler changes the current page event handler. The remaining pages are stamped with headers as usual.

Where I can get hyperlinks in pdf document structure (except "Annots" entry in page dictionary)?

I have two pdf documents (doc1 and doc2) with hyperlinks e.g www.somlink.com, www.somlink2.com.
According to PDF Specification I can get those hyperlinks via Link Annotations. Link Annotations can be found in pdf page's dictionary under "Annots" key.
CGPDFDictionaryRef pageDictionary = CGPDFPageGetDictionary(someCGPDFPage);
CGPDFArrayRef annots;
CGPDFDictionaryGetArray(pageDictionary, "Annots", &annots);
So the problem is that in one pdf document (doc1) I get that "Annots" array but in another document (doc2) there is no such entry in page dictionary.
And the thing is that with PDFKit.framework you can get those annotations in PDFPage class using - (NSArray *)annotations method even if there is no "Annots" entry in page dictionary.
I can't use PDFKit.framework on iPad/iPhone so I am working with Quartz framework :)
So it seems that there is another place where you can specify hyperlinks (or Link Annotations in PDF Reference), not only in "Annots" array and PDFKit.framework somehow know ho to do that.
Any ideas where can I get those hyperlinks?
Links on a page THAT YOU CAN CLICK ON have to be annotations. Period. No annotations, no links.
A string of text "http://blah.com" isn't necessarily a link, it's just a piece of text describing a URL. This may be what's causing your confusion.
It's also possible to embed link actions in bookmarks. I'm not at all familiar with PDFKit or Quartz, so you're on your own as far as API calls are concerned.
And finally, (having reread your question), I believe annotations can be inherited from their parent Pages object. Gonna have to look that one up... Nope. The annotations array MUST be in the leaf page object, or it's not valid.
Can you post links to your PDFs? Something Ain't Right here.
PDF viewer like Adobe Reader simply allows to click and navigate on a plain text, if it looks as a hyperlink - i.e. starts with http://, https://, ftp:// and ends up with some URL delimiter such as space. As simple as that ;)

how to read/parse dynamically generated web content?

I need to find a way to write a program (in any language) that will connect to a website and read dynamically generated data from the website.
Note that it's dynamically generated--it's not enough to get the source html, because the data I'm interested in is generated via javascript that references back-end code. So when i view the webpage source, I can't see the data. (For example, go to google, and do a search. Check the source code on the search results page. Very little of the data your browser is displaying is reflected in the source--most of it is dynamically generated. I need some way to access this data.)
Pick a language and environment that includes an HTML renderer (e.g. .NET and the WebBrowser control). Use the HTML renderer to get the URL and produce an HTML DOM in memory (making sure that scripting is enabled). Read the contents of the HTML DOM after the renderer has done its work.
Example (you'll need to do this inside a System.Windows.Form derived class):
WebBrowser browser = new WebBrowser();
browser.Navigate("http://www.google.com");
HtmlDocument document = browser.Document;
// extract what you want from the document
I used to have a Perl program to access Mapguide.com to get the drive direction from one location to another location. I parsed the returned page and save to database. If the source never change their format, it is OK. the problem is the source format often change, your parser also need change.
A simple thought: if we're talking about AJAX, you can rather look up the urls for the dynamic data. Then you can use the javascript on the page you're talking about to reformat this.
If you have Firefox/greasemonkey making a DOM dumper should be a simple matter.