indesign server pdf creation - pdf

My company uses pdflib to generate pdf and now they are thinking of moving to indesign.
I am doing some initial evaluation, this is our normal process:
designer designs the layout of a pdf.
developers put real texts in. Text lengths can vary, so developers will call some function to determine how much spaces it need and if it is over a page, developer will create a new page with some pdflib api.
Can I do the same thing with indesign server?

You can certainly do what you're doing in pdflib and then some with InDesign Server ("IDS"). IDS has minimal "server" features and is mainly a headless version of desktop Adobe InDesign. There are basically two parts to making a PDF from InDesign Server:
Call the server and tell it to make the PDF (this is typically a
SOAP message that tells IDS what script to run and what data source,
possibly passing data and/or job parameters with the SOAP message).
Run a script with ExtendScript (while there are other languages you can automate IDS with, this is far and away the most common).
You can find the scripting documentation for InDesign CS6 and earlier scripts here:
http://www.adobe.com/devnet/indesign/sdk.html
Download the Adobe CS6 Scripting SDK (and possibly the InDesign CS6 Server SDK, though really the Scripting SDK and the IDS documentation installed with IDS is all you need). The Scripting SDK includes the "InDesign Server Scripting Guide" which includes a "Hello World" example that makes a document and exports it to PDF.
The scripting in InDesign CC is available only through the pre-release program: you need to apply to Adobe to get access if this matters to you. CC is actually extremely similar to the CS6 version--the main difference is in added features. A script that works in CS6 is likely to work in CC.

Related

Selecting text and image from pdf through any programming language

I'm trying to develop a tool/web application such that it will import a PDF file and I need to select text and images available in PDF by selecting them with a mouse click and marking them as title,content and image with a button click (3 different button) where the marked contents and image will be copied to clipboard or will be pasted into a word document which is going to be a another part. So in which programming language is this possible to work with and carry on ?
I'd probably try researching pure browser-side solution using pdf.js and clipboard API.
Otherwise, you'd still need clipboard API in the browser and the server-side may actually be powered by any programming language which can be hooked into a web server and has a library to parse PDFs.
You said nothing at all about your prospective server platform but to name a few, .NET has PdfSharp which is able to read PDFs, Python has a host of tools available for it. After all, there exist a bunch of command-line utilities to extract data from PDF which can be called using any PL able to call external processes.
Note that this only appears to be a simpler solution than using pdf.js but note that unless your PDFs are really uniform (say, invoices created by some piece of software), and so you'll be able to make your PDF parser know which bits of data it has to extract and return, the parser will need to returl all the data it extracted to the client, and you'll need to somehow render it all there. May be it's exactly what you need but maybe not.
Since PDFs are really tailored for typesetting and not presenting information in a structured manner, I'd try to piggyback on an already hard-core PDF rendering solution which runs in the browser, so see above.

is there a way to automate/script (eg perl) a tagged pdf file to see if it's pdf/ua compliant?

We have some tools that generate PDF. We want to automate some testing to make sure the generated PDFs are tagged (PDF/UA) and that the tags are valid.
There are a lot of interactive checkers (acrobat, PDF Accessibility Checker (PAC), etc). They generate reports of things that pass/fail in the PDF based on the matterhorn protocol. I'd like to generate these similar reports but automated.
I recently found a perl module, PDF::API2, that might be promising but I only wrote a few simple tests with perl about 15 years ago. Has anyone used that module for tagged pdf checking or have you done this with a different scripting language?
The technology used in Adobe Acrobat (in its Preflight component) is developed by callas software (caution: I'm heavily affiliated with this company). callas also develops the same technology under the name pdfaPilot, which exists in a manual version but also in command-line and SDK versions that fully automate the process.
But!
As stated by Max Wyss in his comment on your question already, there are two parts to PDF/UA checking. Some of the specification's rules can be tested automatically by software, but a lot of them cannot.
To give one example, it is possible to verify programmatically that all text in a PDF document is tagged with a language. It's a whole other ballgame to check whether those language tags are actually correct.
pdfaPilot Desktop actually allows you to automatically check what is possible, and then allows you to convert the PDF/UA file into visually tagged HTML which makes it much easier to verify that meaning and structure of the text are correct.
In other words, yes, such technology exists, but it will never be 100% complete.

PDF conversion service

I need to develop a service able to convert MS Office and Open Office documents to PDF. And the PDF`s also need to be commentable when opened in ADOBE Reader.
I have used a piece of software from www.neevia.com. And it does the conversion, but is not able to make the PDF´s commentable and is therefore useless in my scenario.
Ideally I would like a piece of software that is monitoring a directory, and when a file is commited to that directory, the software detects this, fetches the file, converts it, and puts it in another directory. This way I can programmatically put the file I want converted in the IN folder and monitor the OUT folder to fetch the file when converted.
So do anyone know a piece of software capable of converting MS Office and Open Office files to commentable PDF`s?
It sounds like you're after the "Extend Features In Acrobat Reader" document rights feature that's part of Acrobat Professional. If you want a programmatic way of doing it then Adobe LiveCycle is the only game in town. This is one of the features that Adobe keeps for itself and no third party is legally allowed to provide it.
You could programmatically, using office automation, print documents to a postscript printer driver to get a postscript file, then use GhostScript to convert the PS file to PDF. Not sure of the commentable features supported by Adobe Reader as opposed to the full version of Acrobat, but it should create a reasonably well supported PDF file.
A-PDF may do what you want, it's web site claims it can convert office docs into PDF including batch convertion with watching a folder.
Both Office 2007 and OpenOffice can save directly to PDF, so you could automate that process.
However, changing the "document rights" of the PDF to allow commenting is something that only Adobe Acrobat can do. (This is Adobe's way of selling more product). There are other 3rd-party tools out there that claim to be able to do it (google change pdf +"document rights"), but I can't vouch for any of them.
I believe the commentable features are part of the PDF software, and not the file. Adobe Professional will allow to add comments, while the reader has less capabilities.
Hmmm, you can develop your own or just buy it off the shelf. My company (shameless plug) has a product that does server based PDF Conversion for common Office formats and can be invoked via a web service.
Blogged about it here. Making office work reliably on the server (32bit/64bit, Win2K3/Win2K8) is challenging to say the least.

Is there a free way to convert RTF to PDF?

How can I programmatically convert RTF documents to PDF?
OpenOffice.org can be run in server mode (i.e. without any GUI), can read RTF files and can output PDF files.
You have a number of options depending on:
the platform(s) your application will be running on
whether your application will be a server application (e.g. a web service that you set up once and then it runs), or a widely-available desktop application (e.g. something that must be easily downloadable and installable by many people)
whether you are willing to put little or more programming effort into getting the solution to work
whether you are flexible as to the programming language you will use
Here are some options:
PDFCreator + COM
Windows only
suitable for both desktop and server applications
medium programming effort
any language that allows you to speak COM
OpenOffice ( + JODConverter - optional )
Cross-platform (Windows, Linux, etc.)
suitable for server applications, as OpenOffice is a 100MB+ download
low programming effort
Java (if using JODConverter), or any language that can interface with OpenOffice's UNO
IText + Apache POI
Cross-platform (Windows, Linux, etc.)
suitable for both desktop and server applications
high programming effort
Java
EDIT
Here is an older post that has some commonality with your question.
EDIT 2
I see from your comments that you are on Linux and open to either C++ or Java. Definitely use option 2.
JODConverter (Java): the library takes care of spawning OpenOffice in headless mode and talking Uno to it on your behalf. You provide JODConverter with an input and output file name as well as the input and output types (e.g. rtf and pdf), and when it returns to you the output file is ready.
C++: you can fork+exec one (or more, for load balancing) OpenOffice instances in headless mode (soffice will listen for UNO requests on a socket e.g. port 8100.) From your application use Uno/CPP to instruct OpenOffice to perform the conversion the same way JODConverter does (see the JODConverter source code for how to do this.)
/opt/openoffice.org3/program/soffice.bin \
-accept=socket,host=127.0.0.1,port=8100;urp; \
-headless -nocrashreport -nodefault \
-nolockcheck -nologo -norestore
I am successfully using JODConverter from a Java app to convert miscellaneous document types (some documents dynamically generated from templates) to pdf.
Four years late to the party here, but I use Ted in my web application. I generate RTF programmatically, then use the rtf2pdf.sh script included in the package to generate the PDF. I tried OOo and unoconv previously, but Ted proved faster and more reliable in my application.
Use PDFCreator, a free pdf printer. Just print to pdf. You can control this through COM. Example code is in the COM folder of the install directory.
PDFCreator for windows is the easiest for single documents.
It's also possible to automate PDF creation for large sets of documents by converting them to XML and using XSLT and XSL-FO. There are lots of tutorials for this out there.
For a specific language, such as python, libraries exist to output to PDF fairly trivially.
The only advantage of XML over other simpler solutions is extensibility. You could also programmatically output your document in RTF, HTML, TXT, or just about any other text format.
LibreOffice can convert RTF documents to PDF via command line.
Here are the instructions to install it on CentOS.
And this is an example to initiate conversion from PHP code:
<?php shell_exec('libreoffice4.2 --headless --invisible --norestore --convert-to pdf test.rtf'); ?>
PrimoPDF. It acts as a virtual printer, so you just print to it, and out pops a PDF.
Look at PDF Printer

How to convert Word and Excel documents to PDF programmatically?

We are developing a little application that given a directory with PDF files creates a unique PDF file containing all the PDF files in the directory. This is a simple task using iTextSharp. The problem appears if in the directory exist some files like Word documents, or Excel documents.
My question is, is there a way to convert word, excel documents into PDF programmatically? And even better, is this possible without having the office suite installed on the computer running the application?
Office 2007 allows for this. I have found PDFCreator to be good, the VBA is included in sample files, and have heard that CutePDF is also good. PDFCreator and CutePDF are free.
To work without Office, you would need viewers, as far as I know:
http://www.microsoft.com/downloads/details.aspx?FamilyID=c8378bf4-996c-4569-b547-75edbd03aaf0&displaylang=EN
http://www.microsoft.com/downloads/details.aspx?familyid=95E24C87-8732-48D5-8689-AB826E7B8FDF&displaylang=en
I needed to do this myself, but managed to get it done with .Net and without 3rd party tools:
MSDN: Saving Word 2007 Documents to PDF and XPS Formats
Pretty simple, about 50 lines of code. However I think you will need Word 2007 installed on the machine as well as the ability to Save As PDF
To convert Word documents to PDF, take a look at jWordConvert, a java library that can do exactly that. This will not work with the Excel files though, only with the Word files. The language is not Sharp, it's Java but you could switch to use IText (which is java) instead of ITextSharp.
You can also use a component like activePDF's DocConverter to convert a lot formats to PDF.
Use PDF maker that comes with adobe 7- 9
I just used this code Covert Doc to PDF
I'm surprised Aspose wasn't mentioned here, it's easy, simple, and reliable. Downside is that it is not free.
I've used iTextSharp in the past, it's really good, easy to install (one DLL I believe), the merge takes a bit of tindering so it's not as easy to use as Aspose, but hey, it's free so that is the best part.
TallPDF.NET (comes with a hefty price tag) allows you to serve dynamic PDF from any .NET application including ASP.NET pages and web services.
PDFEdit (free and open source) is an editor for manipulating PDF documents. It has a GUI version and a command-line interface. Scripting is used to a great extent in the editor and almost anything can be scripted. It is possible to create your own scripts or plugins.
The most common way to convert files to a pdf is to print them to a pdf printer driver. There are a number of such drivers, one that i know of that will do the job is Black Ice.
Another is to use Adobe Acrobat's SDK. from memory its very expensive.
Its been a while since i have actually done any work with converting pdf's and the landscape may have changed.