My question is simple and clear. I find that nobody else seems to have this problem. I insist to use Foxit libraries because it supports Farsi text recognition. My programming language is VBA.
It seems that there is no method neither in PhantomPDF nor in FoxIt SDK Libraries to only OCR a PDF file without saving it as Excel file afterwards.
Related
I'm currently using my scanner to turn my PDFs into searchable PDFs. The OCR is already taken care of, since I can use ctrl-f within the PDF.
How can I get at the OCR'd content from my program though.
I'm open to using java, ruby, the question is kind of programming language agnostic. Is the OCR'd text openly accessible by reading the file?
Not sure how your OCR software creates the PDF, but could you use a third-party library (jPedal) or tool such as iText or XPDF to extract the text from the resulting PDF?
I have PDF-files with embedded OCR data. (So I already orcd them) So they are searchable. Now I want to extract this OCR data, because I want to put in in my tomcat6 searchserver. For doing this, I need the plain OCR data.
So my question is, is it possible to extract this embedded OCR-Data from the pdf Files?
It would be nice to get files with coordinates. But it would also be sufficient to get plaintext files.
You should be able to do this with iText or iTextsharp. iTextsharp has 0 documentation however, and a good number of the functions are not equivalent to those found in iText.
PDFSharp does not support iref streams. Those are pretty much the only comprehensive opensource solutions. If you do not mind paying, vista solutions may have something for you, they mostly handle workflow, but they have some pretty extensive pdf libraries as well.
I want to convert pdf file into xls using vb.net. How can i do it? I don't want any third party software.
Interpretting PDF with pure vb.net will be pretty ambitious task without 3rd party software. Especially if its any PDF. There are tools like GhostScript that can interpret PDF files to images and other formats, but not sure if that will help.
An idea might be to try and convert a PDF to Html, it may be easier to render html in an XLS file.
Im currently writing my bachelor thesis with latex and using TexnicCenter. I want to be able to send my generated pdf file to people and they should be able to write comments.
It seems like commenting is not allowed by default, how do I change this?
I am using straight to PDF with pdflatex and acrobat reader 9 to read and comment on the files
I think your problem is that acrobat reader doesn't allow commenting on documents not produced by abode approved products, which I don't think pdflatex would be.
You should look at the free PDF-XChange Viewer which allows you to comment and annotate the text. Its a portable windows app (download), so doesn't need to be installed on your (or the reviewers) machines.
In order to comment using the free Adobe Reader application, the document needs to be signed with a cryptographic key only available from Adobe's commercial (non-free, for-pay) software suites. Likewise, if one is using Adobe Acrobat (not the free Reader) to view a PDF document, commenting may be activated -- or so I hear. The idea here is that it takes some piece of commercial Adobe software in the scenario -- be it producer or consumer -- to make commenting possible.
There are other free PDF producer and consumer applications that allow some form of annotation, but none of them are equivalent to the "native" form offered by Adobe's products.
Strange... I just finished my master thesis, using TexnicCenter and the MikTeX distribution, and comments worked just fine. What build profile do you use? Straight to PDF with pdflatex, or via the PS->PDF route? You might want to try the pdflatex method.
(EDIT): ah, we used Acrobat Pro for commenting, so that's why it did work in our case... Thanks rsg!
You can download the 30 day trial of Acrobat Professional 9, and enable the user rights required on the pdf so that they can comment using Acrobat Reader.
I would definitely have a look at the LaTeX Web Companion. There is a whole section about generating PDF from LaTeX, including esoterica such as forms.
We are developing a little application that given a directory with PDF files creates a unique PDF file containing all the PDF files in the directory. This is a simple task using iTextSharp. The problem appears if in the directory exist some files like Word documents, or Excel documents.
My question is, is there a way to convert word, excel documents into PDF programmatically? And even better, is this possible without having the office suite installed on the computer running the application?
Office 2007 allows for this. I have found PDFCreator to be good, the VBA is included in sample files, and have heard that CutePDF is also good. PDFCreator and CutePDF are free.
To work without Office, you would need viewers, as far as I know:
http://www.microsoft.com/downloads/details.aspx?FamilyID=c8378bf4-996c-4569-b547-75edbd03aaf0&displaylang=EN
http://www.microsoft.com/downloads/details.aspx?familyid=95E24C87-8732-48D5-8689-AB826E7B8FDF&displaylang=en
I needed to do this myself, but managed to get it done with .Net and without 3rd party tools:
MSDN: Saving Word 2007 Documents to PDF and XPS Formats
Pretty simple, about 50 lines of code. However I think you will need Word 2007 installed on the machine as well as the ability to Save As PDF
To convert Word documents to PDF, take a look at jWordConvert, a java library that can do exactly that. This will not work with the Excel files though, only with the Word files. The language is not Sharp, it's Java but you could switch to use IText (which is java) instead of ITextSharp.
You can also use a component like activePDF's DocConverter to convert a lot formats to PDF.
Use PDF maker that comes with adobe 7- 9
I just used this code Covert Doc to PDF
I'm surprised Aspose wasn't mentioned here, it's easy, simple, and reliable. Downside is that it is not free.
I've used iTextSharp in the past, it's really good, easy to install (one DLL I believe), the merge takes a bit of tindering so it's not as easy to use as Aspose, but hey, it's free so that is the best part.
TallPDF.NET (comes with a hefty price tag) allows you to serve dynamic PDF from any .NET application including ASP.NET pages and web services.
PDFEdit (free and open source) is an editor for manipulating PDF documents. It has a GUI version and a command-line interface. Scripting is used to a great extent in the editor and almost anything can be scripted. It is possible to create your own scripts or plugins.
The most common way to convert files to a pdf is to print them to a pdf printer driver. There are a number of such drivers, one that i know of that will do the job is Black Ice.
Another is to use Adobe Acrobat's SDK. from memory its very expensive.
Its been a while since i have actually done any work with converting pdf's and the landscape may have changed.