I want to create a software using visual basic which reads some text in a PDF file (name on an invoice), and then creates a folder using that name. Is this possible to do, and how would I get started on this? I have programming experience in the past.
PDFs are difficult to manipulate. To do it efficiently, you'd need some libraries that allow you to open the PDFs and extract the text from it.
I haven't used VB much, but I don't expect that there will be much support for PDFs.
You are probably better off using a language like Python, which has a lot of support for PDFs.
See for instance:
- http://www.binpress.com/tutorial/manipulating-pdfs-with-python/167
- https://pypi.python.org/pypi?:action=search&term=parse+pdf&submit=search
The first link also contains a few tutorials.
I have PDF-files with embedded OCR data. (So I already orcd them) So they are searchable. Now I want to extract this OCR data, because I want to put in in my tomcat6 searchserver. For doing this, I need the plain OCR data.
So my question is, is it possible to extract this embedded OCR-Data from the pdf Files?
It would be nice to get files with coordinates. But it would also be sufficient to get plaintext files.
You should be able to do this with iText or iTextsharp. iTextsharp has 0 documentation however, and a good number of the functions are not equivalent to those found in iText.
PDFSharp does not support iref streams. Those are pretty much the only comprehensive opensource solutions. If you do not mind paying, vista solutions may have something for you, they mostly handle workflow, but they have some pretty extensive pdf libraries as well.
I Tried with iText and PDFBox .
It is not simple , we need to understand lot of code for this.
Can anybody provide a simple way of reading and writing PDF using Java Application.
Make sure the application is standalone, and no need of any web/application server.
There are loads of simple examples for manipulating PDFs with Itext in the Itext in Action Book.
PDF is a complex file format. What are you trying to do exactly?
Our eTendering solution, www.monaqasat.com, currently works exclusively with PDF documents for various reasons, some of them being security. We are being asked if we can support DWF documents. For this to happen, we would need to find a way to automatically convert DWF documents to PDF, using some kind of Unix application.
Does anybody know any such application, preferably using Rails or Java?
Thanks,
.Karim
http://www.autodwg.com/pdf/
http://www.dwgto.com/
http://www.aidecad.com/
http://en.wikipedia.org/wiki/List_of_PDF_software
http://www.cogniview.com/convert-pdf-to-excel/category/pdf/
Suggestion would be to install a software printer call its APIs and pass dwf and get back pdf and then apply security as needed.
Autodesk has its DWF Toolkit available at
http://www.autodesk.com/dwftoolkit
It contains full source code in C++ to read & write DWF files, so it should be reasonably easy to make it run under Linux and to use a PDF library to write the output.
We are developing a little application that given a directory with PDF files creates a unique PDF file containing all the PDF files in the directory. This is a simple task using iTextSharp. The problem appears if in the directory exist some files like Word documents, or Excel documents.
My question is, is there a way to convert word, excel documents into PDF programmatically? And even better, is this possible without having the office suite installed on the computer running the application?
Office 2007 allows for this. I have found PDFCreator to be good, the VBA is included in sample files, and have heard that CutePDF is also good. PDFCreator and CutePDF are free.
To work without Office, you would need viewers, as far as I know:
http://www.microsoft.com/downloads/details.aspx?FamilyID=c8378bf4-996c-4569-b547-75edbd03aaf0&displaylang=EN
http://www.microsoft.com/downloads/details.aspx?familyid=95E24C87-8732-48D5-8689-AB826E7B8FDF&displaylang=en
I needed to do this myself, but managed to get it done with .Net and without 3rd party tools:
MSDN: Saving Word 2007 Documents to PDF and XPS Formats
Pretty simple, about 50 lines of code. However I think you will need Word 2007 installed on the machine as well as the ability to Save As PDF
To convert Word documents to PDF, take a look at jWordConvert, a java library that can do exactly that. This will not work with the Excel files though, only with the Word files. The language is not Sharp, it's Java but you could switch to use IText (which is java) instead of ITextSharp.
You can also use a component like activePDF's DocConverter to convert a lot formats to PDF.
Use PDF maker that comes with adobe 7- 9
I just used this code Covert Doc to PDF
I'm surprised Aspose wasn't mentioned here, it's easy, simple, and reliable. Downside is that it is not free.
I've used iTextSharp in the past, it's really good, easy to install (one DLL I believe), the merge takes a bit of tindering so it's not as easy to use as Aspose, but hey, it's free so that is the best part.
TallPDF.NET (comes with a hefty price tag) allows you to serve dynamic PDF from any .NET application including ASP.NET pages and web services.
PDFEdit (free and open source) is an editor for manipulating PDF documents. It has a GUI version and a command-line interface. Scripting is used to a great extent in the editor and almost anything can be scripted. It is possible to create your own scripts or plugins.
The most common way to convert files to a pdf is to print them to a pdf printer driver. There are a number of such drivers, one that i know of that will do the job is Black Ice.
Another is to use Adobe Acrobat's SDK. from memory its very expensive.
Its been a while since i have actually done any work with converting pdf's and the landscape may have changed.