generating pdf files with php - pdf

After some work with PHPExcel, I finally get it to generate sheets of 3000cells in ~5 seconds by using a big array.
With same data, I'll need to generate some pdf files. I've tried to do it with PHPExcel, but it is not a good choice. Generating a pdf file with PHPExcel, took a lot of time and a lot of resources.
I've tried to generate a pdf file with html2pdf php library. The file which contain a table with 3000 cells took me 20 seconds o generate.
My problem is that I can't find a good solution to my problem. Do you know any good library? Do you know any good practices in generating pdf files faster, with a low load on server side?

You can use the FPDF library to generate PDF files in a fast manner and you can use the Write HTML tables add-on to achieve what you want (see example at the bottom of the page).

PhpExcel uses TCPDF to generate PDF, the same as HTML2PDF with PHP5:
HTML2PDF is a HTML to PDF converter written in PHP4 (use FPDF), and PHP5 (use TCPDF).
I think that when generating a PDF, PhpExcel first generates XLS, then converts it to HTML, then again converts it to PDF. Not very efficient.
That is why by using HTML2PDF you can cut to 20 seconds.
--
To cut waiting time even more, maybe you could try another library, like dompdf, and keep skiping PhpExcel when what you need is a PDF.
If your table doesn't have formulas, you can generate all the content in an array, and pass it to some function to generate an XLS with PhpExcel, and to another to generate a PDF.

Related

Is it possible to obfuscate PDF file binary data?

Is it possible to obfuscate the bytes that are visible when a PDF file is opened with a hex editor? Also, I wonder if there is any problem in viewing the contents of the PDF file even if it is obfuscated.
You will always be able to see whatever bytes are within a file using a hex editor.
There might be ways to generate your pdf pages using methods that don't involve directly writing the text into the pdf (for example using javascript that's obfuscated).
Like answered above, the bytes of the file are always visible when being viewed with a hex-editor. However there are some options to hide/protect data in the file:
You could encrypt either the whole pdf or partial datasets. Note that an encryption/decryption always requires a secret. When the file is fully encrypted you can't read it without the key.
You can add additional similiar dataframes but set them invisible in the pdf. Note that this technique blows up the size of the file.
You can use scripting languages which dynamicly build up your pdf. Be aware that this could look suspicious to users or any anti-virus software.
You can use tools steganography to hide your data. For example a tool you could use is steghide
You can simply compress datastreams in the pdf, e.g. using gzip or similiar compression tools. That way you can't read it directly. However that is easy to recognize and to uncompress for anyone.

Scribus, IText reduce file size

I'm generating PDFs using IText within a webapp. The PDF files come from Scribus, and are huge (2MB for one page). The current approach is that the PDF has lots of form fields, which then get populated by IText (AcroForms, etc.)
The individual PDF generated by Scribus is 2MB. It could be as small as 150K. I know that due to having run GhostScript on it. See below.
For large files (some could be 150 pages), the server bogs down, and often no PDF results.
GhostScript will reduce the file to 150K per page. But, I can't run that as a post-process if the PDF generation never completes. If I run GhostScript on the initial PDF that gets fed to IText, then the form fields go away, and the result is an empty form.
So, I either need a way to run GhostScript without losing the form fields (or another external tool that does the same thing), or a way for IText to populate a PDF via some means other than form fields. Is there any IText feature equivalent to good old JavaScript's document.getElementById('xyz').innerHTML = "new text";?
Of course, the absolute best solution would be an export option in Scribus that would simply not do the "place one glyph at a time" that they are so proud of.

WordML to PDF conversion

We receive wordml documents which are basically XML files generated from msword docs which contains all formatting instructions also. Now we have a requirement to convert these files to PDF. I looked at iText xmlworker to do this conversion. What it did was simply removed all XML tags and gave me all the contents as single paragraph in PDF with no formatting.
How to make sure that generated PDF contains text with correct format from this wordml doc.
iText's product XMLWorker requires you to handle each XML element manually (unless you have HTML as input). The XML schema for MS Word documents is extremely complicated, so you'd be working on that for a few years to get something that looks even remotely ok. In short, XMLWorker doesn't do what you think it does.
If you want MS Word to PDF conversion, you need another kind of solution. XDocReport (MIT license) is one of these, and it has plugins for both iText 2 (LGPL license) and iText 5 (AGPL license). Results are not perfect though.

PDF to SWF using swftools.org is not functioning properly

When i convert a pdf to swf few files are not converting and during conversion server utilization is very hign which results in server getting hanged. Sometimes i run a loop of hundreds of pdf files and if some pdf is not converted to swf due to some issue it stops at that step and remaining all pdf not converted to swf. Please help to get out of this problems.
I utilize swftools to convert pdf files on the fly, so they can be viewed inside an inline (flash) pdf viewer. With certain documents, I experienced errors (swftools 0.9.2.) - concerning the contents of the pdf. I did a google search and from that, appending an option -s poly2bitmap helped to solve this issue. Now, since I call swftools from within a Java application, I would listen at std.err and act upon the errors raised here.
This post might aswell explain your problem.

Can we extract pdf pages using lua scripts

Our application is receiving PDF file based on 150 pages from business line, I want to extract pages from this pdf file using lua scripts.
Any body share his experience.
Thanks
Sure, you can do this. As long as you write a Lua module that can read PDF files.
There are some Lua modules for writing PDFs, but none for reading them. No public ones, at any rate. You may want to switch to Python for this, as there are quite a few Python modules for dealing with PDFs.
You could write a Lua wrapper calling something like pdftk.