How to retrieve files in Domino Web documents to embed them instead of showing them as links? - lotus-domino

I have a Notes app that was designed for the browser, not the client. It allowed upload of files into the documents, so nearly all the documents have files. The files are stored in the NSF as $FILE and displayed in the documents as links.
I am using Adobe Acrobat Pro to create PDFs from the documents and need to include the file attachments within the PDFs, however the PDFs just include links to the files, not the attachments. Can I write an agent to run against the documents to get those files and embed them within the documents? When I view those documents through the client, I see all of the HTML etc. and then at the bottom of the document, the file attachments appear. When I view these same documents in the browser, the file attachments do not appear. If I could merely ensure that they are there, then when running the PDF generator in Acrobat Pro, they would be included in the PDFs and executable.
I am really stuck here, with no other way to 'archive' this notes database with all the data intact.
Thanks in advance for any insights!!
Ginni

There is a commercial product from Swing Software that does this. I hear that it's quite good, but I've never used it. Let me explain why...
The way I usually end up doing this is just quick-and-dirty. I write an agent to export the files, using the document UNID as part of the filename. The same agent exports all the data fields from the document into a CSV file, and I add a column with the filename of the extracted attachment. In your case, I would add two columns -- one for the extracted attachment(s), and one for the generated PDF. The CSV serves as an index for the exported data. It can be imported into something more friendly, or just left as-is and brought up in Excel, depending on the customer's usage requirements and available systems. I've recommended Swing Software's product and offered to explore other ideas for developing code (e.g., using wkhtmltopdf for Domino web apps to capture a WYSIWYG rendering based on an HTML crawl) for PDF rendering of Notes documents for a couple of clients, but none of them have justified the cost that would be involved in buying licenses and/or writing the code. Quick and dirty always seems to win, even when there are retention and eDiscovery considerations taken into account.

Related

AEM (Adobe Experience Manager) Indexed PDF Search Results

My employer has recently switched its CMS to AEM(Adobe Experience Manager).
We store a large amount of documentation and our site users need to be able to find the information contained within those documents, some of which are 100s pages in length.
Adobe are disappointingly saying their search tool will not search PDFs. Is there any format for producing or saving pdfs that allow the content be indexed?
I think you need to configure external index/search tools like Apache Solr and use REST endpoint to sync DAM data and fetch results on queries.
Out of the box AEM supports most binary formats, without needing for SOLR. You only need this in advanced scenarios, like exposing search outside of Authoring or having millions of assets.
When any asset is uploaded to AEM Dam it will go though a Dam Asset Workflow which has a step Metadata Processor. That step will extract content from the asset. So "binary" assets like Word docs, Excel and PDF it will be searchable. As long as you have Dam Asset Update workflow enabled you will be ok.

How to generate PDF files using Liferay?

I tried to find proper services for generating PDF files in Liferay, however I have found only class PDFProcessorUtil. How to use it to generate PDF file? How to save the generated file then? I think I should use
DLAppLocalServiceUtil.addFileEntry to save file into Liferay storage.
Liferay's PDF-conversion works by converting documents in the document library and offering them for download - this is implemented through Open Office. Install Open Office or Libre Office, run it in server mode and configure Liferay to use it, then you can choose to select downloads as PDF. The HTML format has a few limitations, as it can include so many external resources, so I'm not sure what your result will be.
If you're generating the HTML output yourself, you might want to consider any other (Liferay-independent) means of generating PDF, as you might not need to upload your files to the Document Library (e.g. if you're generating reports on the fly and just want the generator result to be PDF, but not store them). If this is what you need, you can use any pdf converter library you want - Liferay does not limit you in your choice.
You can also generate the PDFs from the serve resource phase of a portlet.
You put a button or a link somewhere, and when you click on it, you download the PDF.
In this simple example, the PDF is generated from a Freemarker template that generates an HTML that is converted to PDF:
https://github.com/roclas/pdfUtil

Multi-page document merging Adobe Acrobat Pro

We have a number of policies (about 150 or so) we make available for download on our webpage.
Recently, Management had us move from all policies residing in one PDF to one policy per PDF. Their reasoning was to make it easier for the end users to download the policy that they want, and to make it easier for us to replace them when they change.
Now, some end users are complaining that they want to download the whole set of policies as one large PDF. Maintaining both formats as independent documents not only doubles our work, but increases the likelihood of error.
Since these are changed often, what I would like to do is to build a script to instruct Adobe Acrobat Pro to combine these individual policy PDFs together in a specific order.
This doesn't have to be a scrip since a GUI method would also work.
Can this be done? If so, were can I look for examples?
Docotic.Pdf library can merge PDF files while maintaining outline (bookmarks) structure.
There is nothing special should be done. You just append all documents one after another and that's all.
using (PdfDocument pdf = new PdfDocument())
{
string[] filesToMerge = ...
foreach (string file in filesToMerge)
pdf.Append(file);
pdf.Save("merged.pdf");
}

Creating an ics file from data on a PDF file

I'm looking for a way to convert a PDF document into multiple ics files that staff can use to add their fortnight roster to their smart phone calendars or outlook calendar on their desktops. The information required to create the multiple files would be pulled from the PDF by searching for selected initials from each column then referencing data from the same row as the initials. Is their a particular order I need the data to appear in the ics file to allow it to import to a smartphone calendar??
You can search for pdf APIs for more details in handling a pdf using programmatically.
and here are some online converters that could help. They convert a pdf into word
http://www.pdftoword.com/success.aspx
http://www.pdfescape.com/account/?expired
However, reconstructing structured data from PDF is not trivial because a program has to deduct the semantics in the layout. So most programs can only restore scattered data from a pdf.
I've done this with PERL and windows Adobe PDF viewer to highlight all the text in the PDF and cut and paste to a text file. As the previous answer said, you have to write PERL (or any other text processing language) to pick out the format of the PDF you have. Then you can print it with PERL to csv or to ical or whatever format you want. I've shared my code on github.com. I'm not sure if you know GIT, but send me a private message if you want me to send the PERL code outside of GIT.
The PDF's I've converted are here:
http://recplexonline.com/sports/hockey/old-geezers-hockey-35
The Git hub of my PERL code and the input files I used are here:
https://github.com/jdeltoft/PdfParse
It's pretty ugly perl, sorry for that. But it works. I'll try to clean it up soon.

advice on technology to use for document/form creation and indexing

My customer actually stores his documents, which are single page automotive forfeits, in a single MS Word document... this method is of course generating a huge file which is slow to open, not to talk about searches.
After a user compiles a document, he may need to print it to manually sign it. Then the document is scanned back and stored in PDF format. The document may be printed again to be
signed a second time by a manager. The doubly signed document is scanned again and saved
overwriting the singly-signed one.
The user wants to be able to search the document using a couple of search keys (the doc number and a sort of a SSN). That is the reason they are using a single file, to be able to search in the file using Word's search feature.
I have to propose an IT solution. I was thinking about giving them a software tool that:
reads a pdf form/template; the template rarely changes
shows the template on the screen and allows the user to input his variable fields in the form
some of the fields must be defined as searchable
the user saves only the form fields, not the whole pdf.
the sw is able to rebuild a document by coupling the template with the fields. I have to find a way to tie the template with the saved fields, so that the template can change (versioning) without breaking the old documents
the tool allows to search in multiple documents, using the defined search fields
the tool allows to print the document to manually sign it; this is the hard part. When the document is signed cannot be changed anymore, but if the document is simply scanned and coupled with the form/fields pdf, then I'll loose the benefits of only storing the data decoupled from the template. Should I only scan the signature and attach it to the document as an image?
What do you suggest to use?
Adobe XML Forms?
Adobe Forms Data Format?
An already existing software?
Other?
For the existing documents, I want allow the customer to import his huge MS Word file into the new system.
Thanks.
Sounds like you want a PDF form template that submits data to a dB that can be searched.
OTOH, if you just save the PDFs, Acrobat Pro can generate an index file from a directory, that can be searched (from reader?). Yep, you can run searches on an index from reader, but can only build them with Acrobat.
I prefer AcroForms to LiveCycle forms myself. There's a lot more software out there that works with 'em. If you go with LiveCycle, you're almost completely locked into Adobe. And Adobe server software is EXPENSIVE.