Images not reproduced when converting a Blob from HTML to PDF [duplicate] - pdf

This question already has answers here:
Images not showing up on PDF created from HTML
(3 answers)
Closed 7 months ago.
I want to convert HTML emails into a PDF. I have written the following piece of code.
var txt = msgs[i].getBody();
/* We need two blob conversions - one from text to HTML and the other from HTML to PDF */
var blob = Utilities.newBlob(txt, 'text/html',"Test PDF");
Logger.log(txt);
var tempDoc = DocsList.createFile(blob);
var pdf = tempDoc.getAs('application/pdf');
pdf.setName('Email As PDF');
DocsList.createFile(pdf);
The above piece of code first creates a Blob out of the HTML from a Gmail message and uses the getAs() function to convert it to a PDF. However, images in the HTML are not to be found in the PDF. Any ideas on how to get these images would be appreciated.
Any alternative ideas on how to convert a gmail message to PDF is also welcome.

Interesting problem. Makes sense as to why this doesn't work - PDF conversion doesn't bother "rendering" the HTML to go fetch the image src.
I did a quick test and confirmed that Data URI's (inline images without requiring a separate HTTP call) worked with images.
So, one hacky solution could be go fetch the images and then convert them to Data URI. This has a few downsides - hard to find these images (regex would be fragile or not comprehensive), lots of UrlFetch calls (even with some caching, most automated email senders add trackers so that you end up re-fetching the same image) and slow.
Convert -
<img src="http://images.myserver.com/myimage.png..."/>
To (you can check the content type dynamically as well)-
<img src="..."/>

Related

How can i generate and download PDFs in Flask without saving them in my Webapp?

im trying to get a PDF with the ReportLab Module which works fine so far. My problem is that im saving the PDF with the .build()-method in my Webapps directory. What i want is that i can send the PDF for downloading without saving it before. That is somehow possible with the wkhtmltopdf module, but i dont want to use any other servers for this.
The process would be like: User presses a button 'download as pdf', a pdf is generated and instant returned as a download without saving it first.
Do you know if this is possible?
You want to create the PDF server side, return it to the client, without saving the PDF (for example, in S3)?
Yes, this is possible, you create the PDF in memory using
buffer = io.BytesIO()
myPDF = canvas.Canvas(buffer, pagesize=letter)
Then after creating your pdf you save it using
myPDF.save()
buffer.seek(0)
Then when you are ready to return the PDF as a reponse you can return it with:
response = HttpResponse(buffer, content_type='application/pdf')
response['Content-Disposition'] = 'attachment; filename="{}"'.format("myFile.pdf")
return response

How to get rid of unwanted extra pages when converting a goole document to pdf via google-apps-script?

I have an old script that (among other things) converts a google document to pdf.
It used to work ok, but now two extra blank pages appear in the pdf version of the file.
I just discovered that this problem affects also the "download as pdf" menu option in google documents. There is a number of workarounds in that case, but I need a workaround for google-apps-script.
In this post the solution to a similar problem seems to involve a fine tuning of the page size. I tried something like that, but it does not trivially apply.
I also tried some other (kind of random) variations for the page size and margins, but to no avail.
Below I'm pasting a minimal working example. It should create a document file "test" and its pdf version "test.pdf" in your main drive folder.
Any help getting rid of the two extra pages is greatly appreciated.
Thanks
function myFunction() {
// this function
// - creates a google document "test",
// - writes "this is a test" inside it
// - saves and closes the document
// - creates a pdf version of the document, called "test.pdf"
//
// the conversion is ok, except two extra blank pages appear in the pdf version.
// create google document
var doc = DocumentApp.create('test');
var docFile = DriveApp.getFileById( doc.getId() );
// set margins (I need landscape layout)
// this is an attempt to a solution, inspired by https://stackoverflow.com/questions/18426817/extra-blank-page-when-converting-html-to-pdf
var body = doc.getBody();
body.setPageHeight(595.2).setPageWidth(841.8);
var mrg = 40; // in points
body.setMarginTop(mrg).setMarginBottom(mrg);
body.setMarginLeft(mrg).setMarginRight(mrg);
// write something
body.appendParagraph('this is a test').setHeading(DocumentApp.ParagraphHeading.HEADING2).setAlignment(DocumentApp.HorizontalAlignment.CENTER);
// save and close file
doc.saveAndClose();
// convert file to pdf
var docblob = docFile.getAs('application/pdf');
// set pdf name
docblob.setName("test.pdf");
// save pdf file
var file = DriveApp.createFile(docblob);
}
I found the source of the problem and a solution in this post on the google product forum, dating 8 months back.
The extra pages appear in the pdf if the option in view -> print layout is not checked.
I did some further tests, with my accounts and my colleagues'.
The results are consistent:
when view -> print layout is not checked two extra pages appear in the pdf version of the document
when view -> print layout is checked the pdf version of the document has the expected number of pages.
this setting affects also the documentApp services in Google Apps Script. That is: the above script produces the expected pdf version only if the "view->print layout" option in Google Documents is checked.
I do not see how this behaviour could be a "feature", so I think it's a bug. By the way "print layout" does not seem to have any visible effect on my documents (other than messing up the pdf version). I'm surprised that after 8 months the bug is still out there.
Number 3 above surprised me, because I did not think that an option set manually in a (any) google document would affect my scripts.
I'm currently looking for a way of setting the "print layout" option from inside the script. So far I had no luck with that.

Ghostscript - create a pdf with multiple identical pages and keep size down

Im trying to use Ghostscript to create a PDF with multiple identical pages. I will later use this together with another multipaged PDF to stamp on unique information onto every page.
Is it possible to use Ghostscript to create such a PDF and keep the size of the final file down? Maby there is a flag that i have not noticed that can do this in a better way than the script below?
I have tried to use a regular merge command like the one below but the size of the resulting PDF grows alot and the original file size of 2,061MB merged to a 100page pdf results in a final size of 46,117MB.
"C:\Program Files\gs\gs9.20\bin\gswin64.exe"^
-dBATCH^
-dNOPAUSE^
-q^
-sDEVICE=pdfwrite^
-sOutputFile=outputpdf.pdf^
"inputpdf.pdf"^
"inputpdf.pdf"^
"inputpdf.pdf"(and so on 100 times)
You can construct such a file manually easily enough, which is much smaller, by reusing the page content stream for each page.
However Ghostscript's pdfwrite device won;t do that, not least because it can't. It cannot know in advance that the page its about to receive is the same as the previous page. As a result it will create a new page content stream for each page, and create new content for it.
Note that resources (forms, patterns, colour spaces, image XObjects etc) which are used on each page will be reused on other pages.
However, it seems to me that you're already getting nearly a 5:1 ratio (2k * 100 pages = 200Kb, the final file is 46Kb) though in fairness a good bit of that 2Kb is 'stuff' around the page.
Without seeing your input file I can't really comment any further, but frankly I doubt its possible to make it any smaller without hand-crafting the file. What's the problem with a 46Kb file anyway ?

Generation pdf truncated

I have to transform a HTML badly built in PDF.
I transformed the HTML file into XTML with the class Tidy.
Then, generated my PDF with XMLWorkerHelper.
It's work but the generated PDF is not correct.
The images are missing and the text is truncated on certain files.
What specific configuration may I use to solve this problem?
It is the first time when I use these class and it's not easy.
Thanks for your help
I have files html badly constituted to transform into PDF.
I thus used at first Tidy to format them in XHTML and then XMLWorkerHelper to generate the pdf.
I've used itextpdf-5.4.2 xmlworker-5.4.2 .
PdfWriter writer = PdfWriter.getInstance(documentPDF, new FileOutputStream(pdfFilename));
documentPDF.open();
XMLWorkerHelper.getInstance().parseXHtml(writer, documentPDF, new FileInputStream(HTMLFileName));
I can't post my file, it's too big.

How can I generate and download a pdf file in WebDynpro for ABAP?

I've got a task to create a webdynpro that given some inputs, can generate a pdf file with questions and the user should be able to download it somewhere. My question is, how can i generate a PDF in WDs and how do i prompt the download?
I do not know how to do it with Adobe Forms but I surely have seen that done using SmartForms.
When you execute the function module assigned to a smartform there is an EXPORTING parameter for it job_output_info.
With this parameter you execute then the function module CONVERT_OTF with following parameters.
EXPORTING
format = 'PDF'
IMPORTING
bin_file = e_file_as_xstring
TABLES
otf = job_output_info-otfdata[]
lines = lt_pdf_file_lines
Then if you are using WebDynpro for ABAP use the following method to let the user download the file.
wdr_task=>client_window->client->attach_file_to_response(
i_filename = 'Filename.pdf'
i_content = e_file_as_xstring
i_mime_type = 'pdf/application'
)
Not sure how it might work with Adobe Forms, but if you are able to generate the OTF content you should be able to do it as well. On the other hand maybe you are just able to get the PDF as xstring, then the OTF part will not be needed at all.
Maybe this article will help you to know how convert the Adobe Form to xstring: Getting a PDF in an xstring format in the ABAP environment