I'm trying to overhaul a pdf report generation application built in CF8 and they have an interface which generates a 50 page legal report as a pdf and sends it out about 100x a day. However, its very cumbersome and bogs down an already overworked server. Is there a good PDF compression script that I can run with coldfusion or a way to integrate with Adobe acrobat to have it compress the pdf before the server sends the pdf via email? The system is already setup using the available Coldfusion resources to try and help with this process, but its still not sufficient.
Update: I had the opportunity to further dig into this issue. The way these documents are compiled its via 4 CF forms where someone manually types in the legal data as it comes in to the system. Some of the form fields are lengthy (accepting in excess of 10,000 characters or more). Once completed, it runs a cfdocument tag that converts everything into a pdf.
CFDocument generates bloated PDFs. We tested GhostScript and used the following parameters to compress a 22.3mb PDF to 4mb. (If set to "screen", the file size shrunk down further to 2.5mb.)
http://www.ghostscript.com/
To use this, you'll have to perform this optimization as an extra step after the generation of your PDF and use cftry/catch in case there are any issues or timeouts.
<CFSET ThePDFFile = "C:\test\OriginalPDF.pdf">
<cfexecute name="c:\Program Files\gs\gs9.14\bin\gswin64.exe"
arguments="-sDEVICE=pdfwrite -dPDFSETTINGS=/prepress -dNOPAUSE -dQUIET -dBATCH -sOutputFile=#replace(ThePDFFile,'.pdf','-opt.pdf')# #ThePDFFile#" timeOut="160">
</cfexecute>
Another solution would be to generate your PDFs using WKHTML2PDF. The resultant files are much smaller. The quality is consistent on ColdFusion 8-11. You can embed TrueType fonts without having to register them and it doesn't have any of the HTML/CSS quirks that are present with CFDocument.
http://wkhtmltopdf.org/
Here's a link to an article and some sample ColdFusion code that you can use to compare the results of WKHTMLTOPDF to CFDocument.
http://gamesover2600.tumblr.com/post/108490381084/wkhtmltopdf-demo-to-compare-w-adobe-coldfusion
I was actually able to resolve it utilzing the NeeviaPDF compression application and tied it into ColdFusion with the following code:
<cfset pdf_file_name = 'qryGetFileJustUploaded' />
<cfexecute name="C:\Program Files (x86)\neeviaPDF.com\PDFcompress\cmdLine\CLcompr.exe"
arguments="C:\inetpub\wwwroot\testingFolder\PDFCompression2\pdf\#pdf_file_name# C:\inetpub\wwwroot\testingFolder\PDFCompression2\pdf\#pdf_file_name# -co -ci jpg -cq 10 -gi jpg -gq 10 -mi jbig2 -mq 1"
outputfile="C:\inetpub\wwwroot\testingFolder\PDFCompression2\output.txt"
timeout="250">
</cfexecute>
where you can pass in a value to the variable #pdf_file_name# and if you want to set name of the output compressed pdf, just pick a name and place that name where #pdf_file_name# is referenced in the second C:\ line.
Related
I have a Notes app that was designed for the browser, not the client. It allowed upload of files into the documents, so nearly all the documents have files. The files are stored in the NSF as $FILE and displayed in the documents as links.
I am using Adobe Acrobat Pro to create PDFs from the documents and need to include the file attachments within the PDFs, however the PDFs just include links to the files, not the attachments. Can I write an agent to run against the documents to get those files and embed them within the documents? When I view those documents through the client, I see all of the HTML etc. and then at the bottom of the document, the file attachments appear. When I view these same documents in the browser, the file attachments do not appear. If I could merely ensure that they are there, then when running the PDF generator in Acrobat Pro, they would be included in the PDFs and executable.
I am really stuck here, with no other way to 'archive' this notes database with all the data intact.
Thanks in advance for any insights!!
Ginni
There is a commercial product from Swing Software that does this. I hear that it's quite good, but I've never used it. Let me explain why...
The way I usually end up doing this is just quick-and-dirty. I write an agent to export the files, using the document UNID as part of the filename. The same agent exports all the data fields from the document into a CSV file, and I add a column with the filename of the extracted attachment. In your case, I would add two columns -- one for the extracted attachment(s), and one for the generated PDF. The CSV serves as an index for the exported data. It can be imported into something more friendly, or just left as-is and brought up in Excel, depending on the customer's usage requirements and available systems. I've recommended Swing Software's product and offered to explore other ideas for developing code (e.g., using wkhtmltopdf for Domino web apps to capture a WYSIWYG rendering based on an HTML crawl) for PDF rendering of Notes documents for a couple of clients, but none of them have justified the cost that would be involved in buying licenses and/or writing the code. Quick and dirty always seems to win, even when there are retention and eDiscovery considerations taken into account.
Problem: Need to convert local html (with local images etc) to pdf from an AIX box running Universe 11.2.5 with System Builder
Current solution: FTP over html file to a Windows server which converts in batches and sends the e-mail to the destination
Proposed Solution: Do everything on the AIX box, from converting html to pdf and sending the e-mail.
Current problem: Unable to find a way to convert local html to PDF on the AIX box. I have been trying many different ways from trying to install Python3, but to no avail.
The only really difficult part of the process is getting the HTML to render into a format will properly display your html into pages that are suitable for printing. There is a fair amount of magic that goes on between HTTP:GET and clicking print on a browser window that needs to be accounted for.
I was trying accomplish something similar many moons ago on AIX but kind of ran into a skill level/time wall because I was going to have essentially create a headless browser to render the html. It looks like there are now some utilities that you might be able to leverage. I found this recent updated article on Super User that actually got me somewhat excited, especially since I don't use AIX anymore so precompiled binaries and well understood and easily attainable dependencies are something I can actually have in my life.
https://superuser.com/questions/280552/how-can-i-render-a-website-as-an-image-from-the-shell
Good Luck.
There seems to be several questions rolled into this one item.
Converting HTML to PDF, while that is just a data manipulation that you could do in basic, writing such code would be a large task. The option you use sending it to another system is valid, but put more points of failure into the system. I would think you could find code to do it on the AIX box.
Rocket plans on getting the MV Python to work on AIX, this will make the converting of html to PDF much easier since there are a lot of open source modules.
As for my suggestion of using sockets, that would be if you intend to send it to a service that will take the htms, and return the pdf document.
i.e. Is there a web service for converting HTML to PDF?
Once you have the pdf document, you can either store it in a UniVerse type-19 file, or do the base64 encoding and store it in UniVerse hash file.
Hope this helps,
Mike
I tried to find proper services for generating PDF files in Liferay, however I have found only class PDFProcessorUtil. How to use it to generate PDF file? How to save the generated file then? I think I should use
DLAppLocalServiceUtil.addFileEntry to save file into Liferay storage.
Liferay's PDF-conversion works by converting documents in the document library and offering them for download - this is implemented through Open Office. Install Open Office or Libre Office, run it in server mode and configure Liferay to use it, then you can choose to select downloads as PDF. The HTML format has a few limitations, as it can include so many external resources, so I'm not sure what your result will be.
If you're generating the HTML output yourself, you might want to consider any other (Liferay-independent) means of generating PDF, as you might not need to upload your files to the Document Library (e.g. if you're generating reports on the fly and just want the generator result to be PDF, but not store them). If this is what you need, you can use any pdf converter library you want - Liferay does not limit you in your choice.
You can also generate the PDFs from the serve resource phase of a portlet.
You put a button or a link somewhere, and when you click on it, you download the PDF.
In this simple example, the PDF is generated from a Freemarker template that generates an HTML that is converted to PDF:
https://github.com/roclas/pdfUtil
We have a number of policies (about 150 or so) we make available for download on our webpage.
Recently, Management had us move from all policies residing in one PDF to one policy per PDF. Their reasoning was to make it easier for the end users to download the policy that they want, and to make it easier for us to replace them when they change.
Now, some end users are complaining that they want to download the whole set of policies as one large PDF. Maintaining both formats as independent documents not only doubles our work, but increases the likelihood of error.
Since these are changed often, what I would like to do is to build a script to instruct Adobe Acrobat Pro to combine these individual policy PDFs together in a specific order.
This doesn't have to be a scrip since a GUI method would also work.
Can this be done? If so, were can I look for examples?
Docotic.Pdf library can merge PDF files while maintaining outline (bookmarks) structure.
There is nothing special should be done. You just append all documents one after another and that's all.
using (PdfDocument pdf = new PdfDocument())
{
string[] filesToMerge = ...
foreach (string file in filesToMerge)
pdf.Append(file);
pdf.Save("merged.pdf");
}
I have a PDF generated by 3rd party system. Using PDF editor or els software I have modified it.
Is it possible to detect if PDF file was modified, without original file?
I will add some more details.
There is no encryption and no signature features.
Document is created by IT system. User receives document and modifies it.
Is it possible to track that change somehow?
I thought that all these applications leaves some data in PDF header or somewhere encoded inside file and it is possible to check it. However properties showed by windows explorer shows nothing... so I was interested if there is something smarter than viewing properties/header in explorer.
The problem with this is that just opening the PDF on a Mac in Preview and hitting Command-S to save the file will replace both the Creation and Modification date to match the current date/time. So even the creation date will be wrong. Even novice users can unknowingly do this, so if you're trying to track someone who may be purposefully modifying the document, it may lead to a false positive.
What you're asking is just too easy to spoof and fool unfortunately.
You could always check the md5sum of the pdf file. I'm not sure what environment you are using but that should help get you started.
It's going to be rough without the original file unless there were security features like encryption or digital signatures applied to it, which it doesn't sound like there was. Do you have access to any information at all about the original file? A file size, creation date, any of the metadata, etc.?
If the tool used to modify the PDF is working according to the PDF spec then in the Info dictionary it should update ModDate but leave CreationDate alone. You may also see some non-zero generation numbers on the objects although it is just as possible that all the objects have been regenerated and will therefore be generation 0. The trial version of CosEdit will allow you to look at these 2 items.
If however the tool has been used to intentionally modify the PDF without leaving a trace then they would be spoofing those bits of data so they won't help you.
Are the users modifying the PDF using Acrobat? If so then what Danio mentioned above should work. Strictly speaking, modifying the PDF should change its ModDate or xmp:ModifyDate without changing its CreationDate. However not all tools adhere to this; quite a few simply leave all metadata untouched, so this method of checking isn't 100% reliable unless you know what PDF editor your users employ.
If the editor your users use does change ModDate or xmp:ModifyDate, then you should be able to see it in two places. One is when you open the document in Acrobat and hit Ctrl-D to view Document Properties. The Creation field and Modified field should have different timestamps. There may also be APIs that can be used to programmatically retrieve this metadata. The other way you can visualize it is to simply open the PDF in Notepad and search for the properties. Most of the document won't be human readable but these timestamps should be. If they do get changed appropriately, you can always parse for them in your application. Good luck!
If you're using Ubuntu linux 18.04 and using Document Viewer then, you can
click on File options (3 vertical line ellipsis)
click on Properties...
look for Created / Modified fields in the Properties pop up
Beware: A sufficiently knowledgeable user can manipulate the PDF contents without changing the Created and Modified time stamps in the PDF metadata and the file system.
You can use some tools to get the pdf file property.
I use pdfinfo, you can get many property of the file, and check it.
pdfinfo 58dcc41d01293.pdf
Author: worker
Creator: Microsoft® Word 2016
Producer: Microsoft® Word 2016
CreationDate: Sat Aug 24 16:02:29 2019
ModDate: Sat Aug 24 16:02:29 2019
Tagged: yes
UserProperties: no
Suspects: no
Form: none
JavaScript: no
Pages: 55
Encrypted: no
Page size: 841.92 x 595.32 pts (A4)
Page rot: 0
File size: 3346838 bytes
Optimized: no
PDF version: 1.7