Decreasing size of PDF when using puppeteer for pdf generation - pdf

We are using IDR for converting PDF documents to HTML.
After doing some modifications we are using puppeteer for converting that document back to PDF I am getting files with increased page size (even if I don't do any modification to my HTML).
For ex:- If the original page size is 500kb I am getting a page with 1000kb
The page only contains some text.
Please help me to understand what is the reason behind this and how to solve this.

Related

Print webpage with canvas content

I have a tool that allows you to assemble reports generated with fabricjs.
We are trying to convert those reports that currently work on the web to pdf, using the sejda.com tool
The problem with sejda is that it has a generation time limit of 1:50 secs. After that time, the web page returns an error, and we have reports that are taking more than 1:50 seconds to generate completely.
I'm looking for other options, but most don't interpret the content in HTML5, the page comes out blank.
I have tried with html2pdf, javascript2pdf, and a dozen web pages that send the url and try to print the document without success as blank pages.
Is there any solution to our problem? We have been investigating this for months. One of them is to improve the load times of the reports, but it is a complex development due to how it is done.
To solve this I converted the PDF to PNG with PDF.js and stored toDataURL, then retreived the PNG image fromURL and set as background.
or this link can help you : Load PDF into fabricjs canvas
We have finally decided to opt for the caching of the reports.
Now everything works correctly

How do I re-use the same image payload in several pages without repeating it?

I'm using tcpdf to generate a report that contains a logo from a svg vector image.
My goal is to efficiently re-use the same image payload over and over in the report, not storing the logo as if it was a different image on each page.
Right now, with the current data, the report generates 32 pages. The file size considerably increases with new pages being added. This seems to be due to the logo being repeated on every page.
I don't have tools to analyze what is inside the pdf but I can see from other reports that are generated by other applications, that the file size of pdfs containing repeated images peaks at 1 page and then on each consecutive page, the size increases very slightly, indicating that the first logo is efficiently re-used.
How can I achieve that using tcpdf?
If in my report, I place the logo only in page 1 and omit it in pages 2 - 32, still outputting all the text data, the file size is greatly reduced, just as in the examples that I mentioned before. This indicates that the svg data is repeated on every page.
From the example 009 in tcpdf's site documentation, I've tried loading the image from file and also tried using a "data stream" (this is encoding the svg in base64 and instead of referencing the image from a file, you use the text-based base64 variable content as a stream that contains the image payload).
I thought that using the data stream would take care of it, but it didn't.
Is there a way to reference the same image over and over in tcpdf?

Is it possible to convert fabricjs svg output to pdf without rasterizing?

We are building a web app where the user can make a design by using fabric.js and at the end he should receive a pdf file with his work.
At first, we tried to use JSPDF because it was prefered to have a cliente-side solution. However by doing pdf.addImage(canvas.toDataURL(),...) we are rasterizing the design.
In second place, we tried server side solution using WKHTMLTOPDF, sending canvas.toSVG(), but there are some issues with fonts and shapes rendering.
The designs are complex as they can have text, shapes, images and svg.
We also tried INKSCAPE (inkscape --without-gui --export-pdf ...), MPDF and MUPDF without good results. IMAGEMAGICK is not a solution has it also rasterize the design.
The main goal is to get a vector pdf, where it's possible to increase size and where the elements of the design are selectable, and if possible that pdf should be ready to print (300 dpi and cmyk)
Yes its possible using TCPDF library.
Please check this ImageSVG api for more information for converting SVG to PDF.
https://tcpdf.org/examples/example_058/
Export the canvas to svg and use pdflib to make the pdf.
You can find an exemple here:https://www.pdflib.com/pdflib-cookbook/graphics/starter-svg/

White image while inserting a SVG image in TCPDF

I'm trying to insert some SVG images in a PDF using TCPDF with the method TCPDF::ImageSVG, but when I try this I get a white space.
If I try to enable TCPDF::setRasterizeVectorImages the image shows in the PDF file, but it is rasterized of course and so its quality is not good.
Do you have any idea?
Thank you very much for your help!
Unfortunately, TCPDF's SVG handling is quite limited, and the cause of your issue depends on the SVG you are trying to use. Later versions of TCPDF support more SVG functionality, so if you haven't done so, try using a later version of TCPDF.
If an update doesn't resolve the issue, and you're forced to use raster images, you can improve quality at the cost of file size. You can do this by rasterizing them at a high DPI yourself outside of TCPDF. Once you've done this, take your new high-resolution raster image and add it to your PDF with the Image method like any other raster image. At work we usually rasterize to 300dpi, but your application may call for more or less.
If your image gets added to the PDF far larger on the page than you expected, specify at least one of the dimensions so TCPDF knows how much of the page you're intending the image to use.

PDFBox : Converting to image : Quality loss when converting PDF containing scanned documents

My use case is pretty simple. I need to convert the PDFs to images.I tried using apache pdfbox and i am having some trouble in converting pdfs which contains scanned images. when i convert scanned image the image clarity is lost due to compression/scaling. So i was trying to extract the image data from the PDF and then store it. But the problem is i may get PDF files which will contain images and text in which case i would need to fallback to image conversion mode. The problem is how to differentiate between the pages/documents having only image and the ones with composite data. I was thinking i could use ProcSet defenition for this purpose but looks like it is marked as obsolete and non-reliable according to PDF specifications. Other possibility is to check all the objects linked to that page and see if it contains anything other than images. Please let me know if there is an easier way of doing this
Thanks
If your intention is convert pdf to image, It is better to use ImageMagick for that. If you use ImageMagick, there is a lot options to change the quality of the image. And converting pdf to image is pretty simple using ImageMagick.