Generation pdf truncated - pdf

I have to transform a HTML badly built in PDF.
I transformed the HTML file into XTML with the class Tidy.
Then, generated my PDF with XMLWorkerHelper.
It's work but the generated PDF is not correct.
The images are missing and the text is truncated on certain files.
What specific configuration may I use to solve this problem?
It is the first time when I use these class and it's not easy.
Thanks for your help

I have files html badly constituted to transform into PDF.
I thus used at first Tidy to format them in XHTML and then XMLWorkerHelper to generate the pdf.
I've used itextpdf-5.4.2 xmlworker-5.4.2 .
PdfWriter writer = PdfWriter.getInstance(documentPDF, new FileOutputStream(pdfFilename));
documentPDF.open();
XMLWorkerHelper.getInstance().parseXHtml(writer, documentPDF, new FileInputStream(HTMLFileName));
I can't post my file, it's too big.

Related

p:dataexporter columns with images

I want to export a datatable (with p:graphicImages in columns) in PrimeFaces but even in the new version of PrimeFaces I can only see the alt: value of image in my PDF file. Can't I export graphicImage itself?
It's also asked in this link comment #3.
No you can't, not without extending the exporter
I can give you two ideas of how I got to put in a pdf images:
1. Use a printer (), then print it to pdf and you're done. You will have your table with the pretty images.
2. Generate the PDF using iText and you can personalize it as far as you want.

PDFBox - document is empty after loading

I am using Apache PDFBox for rendering thumbnails of PDF documents. Therefore I load the PDF and use the first page as thumbnail. The problem is, that for a particular document, it seems, it is not loaded correctly. For all other docs, it works like expected.
ByteArrayInputStream is = new ByteArrayInputStream(pdfData);
PDDocument pdf = PDDocument.load(is, true);
List<PDPage> pages = pdf.getDocumentCatalog().getAllPages(); //pages is empty here
The pdf file has 238 pages and is around 6,5 MB of size.
Assuming that you're using an 1.8.* version, please use the non sequential parser:
PDDocument pdf = PDDocument.loadNonSeq(is, null);
The non sequential parser is successful in certain cases where the old parser fails, e.g. for PDFs that have had revisions (example). Another advantage is that no extra code is needed for "protected" PDFs that are encrypted with the empty password.

Which xsl-fo elements can be the destination of a fo:basic-link with internal-destination?

I have a document in docbook format. I used to generate HTML from this. Now I tried to generate xsl-fo and then use Apache fop to build a pdf. However fop emitted several warnings about missing link destinations and they don't work in the produced pdf. These links works fine with HTML output. I can see that the missing id:s are actually present in the xsl-fo. It seems like some links works, for example those with a docbook section element as destination. However those which points to a docbook table row element doesn't work.
Is this a docbook bug? Or are there limitations in xsl-fo on which elements might be link destinations? A fop bug? Or actually a limitation in the pdf format itself?
Embarrasing that I didn't found it before. It's a known fop bug, fop-2110.

Images not reproduced when converting a Blob from HTML to PDF [duplicate]

This question already has answers here:
Images not showing up on PDF created from HTML
(3 answers)
Closed 7 months ago.
I want to convert HTML emails into a PDF. I have written the following piece of code.
var txt = msgs[i].getBody();
/* We need two blob conversions - one from text to HTML and the other from HTML to PDF */
var blob = Utilities.newBlob(txt, 'text/html',"Test PDF");
Logger.log(txt);
var tempDoc = DocsList.createFile(blob);
var pdf = tempDoc.getAs('application/pdf');
pdf.setName('Email As PDF');
DocsList.createFile(pdf);
The above piece of code first creates a Blob out of the HTML from a Gmail message and uses the getAs() function to convert it to a PDF. However, images in the HTML are not to be found in the PDF. Any ideas on how to get these images would be appreciated.
Any alternative ideas on how to convert a gmail message to PDF is also welcome.
Interesting problem. Makes sense as to why this doesn't work - PDF conversion doesn't bother "rendering" the HTML to go fetch the image src.
I did a quick test and confirmed that Data URI's (inline images without requiring a separate HTTP call) worked with images.
So, one hacky solution could be go fetch the images and then convert them to Data URI. This has a few downsides - hard to find these images (regex would be fragile or not comprehensive), lots of UrlFetch calls (even with some caching, most automated email senders add trackers so that you end up re-fetching the same image) and slow.
Convert -
<img src="http://images.myserver.com/myimage.png..."/>
To (you can check the content type dynamically as well)-
<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhE..."/>

How to create pdf file from Qt application?

In my Qt application I am conducting some network tests. I have to create a report according to the test output. So I need to create the report in pdf format.
Can anybody please let me know how I can put my test results in a pdf file? My result contains graphs using the Qwt library.
this code outputs pdf from html:
QTextDocument doc;
doc.setHtml("<h1>hello, I'm an head</h1>");
QPrinter printer;
printer.setOutputFileName("c:\\temp\\file.pdf");
printer.setOutputFormat(QPrinter::PdfFormat);
doc.print(&printer);
printer.newPage();
I guess you can generate an html wrapper for your img and quickly print your image. Otherwise you might copy the image directly on the printer, since it is a paint device in a similar fashion
QPrinter printer;
QPainter painter(&printer);
printer.setOutputFileName("c:\\temp\\file.pdf");
printer.setOutputFormat(QPrinter::PdfFormat);
painter.drawImage(QRect(0,0,100,100), <QImage loaded from your file>);
printer.newPage();