When i convert a pdf to swf few files are not converting and during conversion server utilization is very hign which results in server getting hanged. Sometimes i run a loop of hundreds of pdf files and if some pdf is not converted to swf due to some issue it stops at that step and remaining all pdf not converted to swf. Please help to get out of this problems.
I utilize swftools to convert pdf files on the fly, so they can be viewed inside an inline (flash) pdf viewer. With certain documents, I experienced errors (swftools 0.9.2.) - concerning the contents of the pdf. I did a google search and from that, appending an option -s poly2bitmap helped to solve this issue. Now, since I call swftools from within a Java application, I would listen at std.err and act upon the errors raised here.
This post might aswell explain your problem.
Related
We have a program that processes PDF documents - Automated. We fail with certain PDFs because they are malformed . When we open the PDFs in acrobat, it opens it. I see that Acrobat goes to extra measures to fix the malformed PDFs. So in our case, someone manually has to open and save them to make them clean. Is there a way I can programmatically do this in Python or Powershell? Has anyone done this?
Thanks!
You might try this this link.
You can run a macro from powershell. You can also set up a scheduled task to run your powershell script in task scheduler at pretty much any interval you like (TASKSCHD.MSC) This particular example has a msgbox for the path to folder but it loops through all pdf files in a folder, flattens and saves. Perhaps flattening might not be required but might help with a malformed PDF.
** This relies on Acrobat and uses the javascript API through the excel ... I'm not sure if libreoffice draw has has a javascript api like acrobat. I'm not aware of any open source alternatives that have that sort of functionality. If anyone is please let me know.
We have created PDFs from converting individual PostScript pages into a single PDF (and embedding appropriate fonts) using GhostScript.
We've found that an individual page of the PDF cannot be linked to; for example, through the usage of
http://xxxx/yyy.pdf#page=24
There must be something within the PDF that makes this not possible. Are there any specific GhostScript options that should be passed when creating the PDF that would allow this type of page-destination link to work?
There are no specific pdfwrite (the Ghostscript device which actually produces PDF) options to do this. Without knowing why the (presumably) web browser or plugin won't open the file at the specified page its a little difficult to offer any more guidance.
What are you using to view the PDF files ?
Can you make a very simple file that fails ? Can you make that file public ?
If I can reproduce the problem, and the file is sufficiently simple, it may be possible to determine the problem. By the way, which version of Ghostscript are you using ?
After some work with PHPExcel, I finally get it to generate sheets of 3000cells in ~5 seconds by using a big array.
With same data, I'll need to generate some pdf files. I've tried to do it with PHPExcel, but it is not a good choice. Generating a pdf file with PHPExcel, took a lot of time and a lot of resources.
I've tried to generate a pdf file with html2pdf php library. The file which contain a table with 3000 cells took me 20 seconds o generate.
My problem is that I can't find a good solution to my problem. Do you know any good library? Do you know any good practices in generating pdf files faster, with a low load on server side?
You can use the FPDF library to generate PDF files in a fast manner and you can use the Write HTML tables add-on to achieve what you want (see example at the bottom of the page).
PhpExcel uses TCPDF to generate PDF, the same as HTML2PDF with PHP5:
HTML2PDF is a HTML to PDF converter written in PHP4 (use FPDF), and PHP5 (use TCPDF).
I think that when generating a PDF, PhpExcel first generates XLS, then converts it to HTML, then again converts it to PDF. Not very efficient.
That is why by using HTML2PDF you can cut to 20 seconds.
--
To cut waiting time even more, maybe you could try another library, like dompdf, and keep skiping PhpExcel when what you need is a PDF.
If your table doesn't have formulas, you can generate all the content in an array, and pass it to some function to generate an XLS with PhpExcel, and to another to generate a PDF.
My program downloads a PDF file from a source location every day. When I see the binary text of the PDF file in Notepad, I find that sometimes the PDF file has the string <!-FTCACHE-1-> at the end. Sometimes this word is missing from the PDF file.
My program downloads this PDF daily and compares it with the previous day's PDF file using the Windiff binary comparison.
99% of the time, Windiff reports differences in the PDF file just because one PDF contains the string <!-FTCACHE-1-> at the end.
Does anyone knows what the reason behind this is?
Thanks,
Praveen
<!--FTCACHE-1--> is generated by FatWire Content Server, a web content management solution that is probably generating your URL. FTCACHE means FutureTenseCache, the name of the original product component. The text is a "footer" flag that indicates to the caching module whether or not the page was properly generated. If the page is supposed to be cached, a 1 indicates that the page was properly built, and so is cacheable. If 0 is returned, it indicates that the page was corrupted and should not be cached. The Satellite Server caching engine is supposed to strip this footer once it reads it.
In other words, the key that is there to ensure that the cache is not corrupted, is causing the corruption in your PDF.
This issue has been fixed in patches to FatWire ContentServer for quite some time now.
For your purposes, just ignore the string - strip it if you can.
Sorry about that. That was my bug. :-)
The application that generates the PDF file has a bug, the FTCACHE tag should not be there, it is not a valid PDF construct. Its presence actually damages the PDF file, it invalidates the FastWebView feature in the PDF file, as you have seen it. It is safe to remove it before comparing the files.
"FT" could be FreeType, the open source font engine. The comment probably comes from the software that generates the PDF. If you can somehow identify that, you could (assuming it is open source) perhaps take a look through it and see what causes it to emit the comment.
FreeType has a source folder dedicated to caching, the root source file there is called ftcache.c. It doesn't do a lot though, just #includes (!) the other source files.
Googling on the string you see, reveals several more or less random PDF:s that seem to contain it.
Is there any way to generate PDF files from classic ASP? I have a bunch of user-entered data that needs to be turned into a PDF that the user can download. How can I do this? OpenOffice allows exporting documents to PDF, so could this somehow be leveraged?
I played around a bit with this (Persits ASPPDF): http://www.asppdf.com/
Maybe running an external application that could be using CrystalReports... and you just pass it as an xml?
That's how i would do it... (lazy mode)
See a full list of PDF components here: http://www.aspin.com/home/components/document/pdf Many of them are free.
It is also possible to use XSLT to output PDF but I am not sure if this is supported by the Microsoft XML Parser. I remember there were something stopping me when I tried to do this 3-4 years ago. Might be worth checking out know depending out the type of data you have as source.
However if these are static files or a one time job consider using a PDF converter on your computer and just upload the files to the server. There are heaps of tools for this, including Adobe Acrobat.