Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
I'd like to embed a PDF file viewer in a window of my planned-to-be open-source application. I don't want to release my application on GPL though, and most of open-source PDF libraries are on GPL (poppler, ghostscript, muPDF).
Is there a PDF viewer library that would be on a non-viral open-source license?
Thanks,
It seems that there is a new BSD-licensed contender: PDFium.
IANAL. Blah blah blah.
Using GhostScript by shelling out to a command line will not require you to change your licensing in any way. Batch files used to call GhostScript aren't automagically GPL'ed.
With GPL, I'd always understood that it boiled down to "Separate process? Separate license!".
So you just have GS whip up a relatively hi DPI version of the PDF page in question, and let the user pan and zoom around in that. Because GS IS in a separate process, you could fire off additional page requests in the background so the user won't perceive a delay when paging back and forth. GS takes a page range as one of its conversion parameters.
What you couldn't do is generate an image of a small part of an individual PDF page at high DPI/zoom. IIRC, you have to render the whole page.
If your application is open source and free then you should consider the option to host Adobe Reader ActiveX control (which requires to have Adobe Reader to be installed), this behavior would be the same as embedded Adobe Reader in Internet Explorer or Firefox browsers.
Lot of users have Adobe Reader or Foxit Reader installed on their computers already.
Related
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
I am working on a JavaScript-based application and there is a requirement to display PDF files in browsers. At the moment I am using and approach to embed a PDF file on the fly. This approach works in Firefox and Chrome because those browsers have built-in pdf viewer but it does not work in Internet Explorer. So I am looking for a cross-browser solution, other than pdf.js library.
There is no alternative to Mozilla's PDF.js -- the JavaScript cross-browser solution that allows fully parse and render PDF on the client/browser side. You can try compiling some C library to asm.js/WebAssembly (PDFium or Poppler), but you will only get rasterizer of a PDF page to a bitmap. PDF.js attempts to provide more things e.g. painting via canvas API or building DOM text layer for accessibility purposes.
All other JavaScript "viewers" have to have server side, or pre-process PDF before-hand into some other format e.g. PNG (which is technically is not a PDF anymore).
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 6 years ago.
Improve this question
I am looking for PDF optimization techniques/pre-processing to convert Print ready PDF to media ready(Press PDF to web PDF).
Target devices for rendering the PDF are iOS and Android
Tools like Adobe Acrobat Pro, provides settings for such tasks like reduction of layers, merging of layers etc.
Expected output PDF shall have only three layers:
1) Text
2) Image
3) Special effects
We can do this by using pre-flight and thus selecting the layers and merge them.
Any steps to do this effectively, i don't want to do such optimization at page level manually.
Can i import layer (say multiple image.tiff) at multiple pages at a single run?
You can use Ghostscript for that.
If you want to do that via Ghostscript User interface, you can download Ghostscript Studio (IDE) and use this switches in the Ghostscript Processor:
! >> interaction-related parameters
-dBATCH ! keep gs out from going into interactive loop reading
-dNOPAUSE ! disables the prompt and pause at the end of each page
! >> device selection parameters
-sDEVICE = pdfwrite ! pdf device
-dPDFSETTINGS=/ebook
Also, take a look at this answer: Optimize PDF files (with Ghostscript or other)
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
We have a bunch of documents in our organization that were inadvertently saved as Adobe PDF packages (also known as PDF 1.7 "collections").
We would like to convert these to normal PDFs (most of these "packages" contain one bog-standard pdf file), but given the number of files, it's not possible manually.
Any Adobe expert know whether:
There is an open-source or free library that handles PDF package format that I can write a script around?
Does Adobe Pro 9 have a relevant scriptable interface that would allow me to extract the relevant file from each package?
Alternatively, I'm looking at a macro-based approach, but I'd rather not go this route until investigating other options.
Thanks!
After a bunch of digging around, I found pdftk, which is distributed as source and binary on many platforms.
It does almost all of what we need to do, and we can now iterate through our documents and recursively call pdftk on each (some are multi-level attachment chains).
Note pdftk will only burst pages of the visible document into individual documents. The hidden documents remain hidden.
The option you need to use is unpack_files.
Yet another unwanted obfuscation format to hinder interoperability therefore classified as malware.
Using Adobe Acrobat Professional combine all into one pdf and then split by bookmark level
I understand this thread is few years old but if anyone is looking for free utility to extract files from PDF packages (especially from large collections) then check the free utility ByteScout PDF Multitool: it was tested against 500+ MB package files to extract hundreds of multilevel chained attachments.
Disclaimer: I'm affiliated with ByteScout
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 4 years ago.
Improve this question
My application incorporates the manual as a PDF file and I want that the user can read the manual without exit and with a minimun overload.
Do you know any free (as in beer) component for .net that can just read pdf files? (I don't need editing).
Thank you.
P.D.: Yes, I did Googled, but I can't find a free one.
P.D.2.: If I don't need to install anything on the target computer, then it could be perfect!
Edit - Added
You don't specify what you're using as a development language. I'm guessing that it's some .Net language. If not, this will NOT be helpful to you.
End Added Content
Is this a Windows Forms application?
I don't know if you've thought of this, but you can create a form with a WebBrowser control, and set the WebBrowser's DocumentSource to be the PDF document you're talking about. This form can be controlled by your application. The WebBrowser control will just use whatever version of Adobe Acrobat that Internet Explorer would use on the client's PC. Almost every computer out there has some version of the Acropbat Viewer, so there is very little chance you would need to install anything.
The reasons I recommend this are:
No need to buy a component
It works. Simply, beautifully, and it's as error free as just opening the PDF via Internet Explorer.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 7 years ago.
Improve this question
Is there a good library for extracting text from a PDF? I'm willing to pay for it if I have to.
Something that works with C# or classic ASP (VBScript) would be ideal and I also need to be able to separate the pages from the PDF.
This question had some interesting stuff, especially pdftotext but I'd like to avoid calling to an external command-line app if I can.
You can use the IFilter interface built into Windows to extract text and properties (author, title, etc.) from any supported file type. It's a COM interface so you would have use the .NET interop facilities.
You'd also have to download the free PDF IFilter driver from Adobe.
Here is a good list:
Open Source Libs for PDF/C#
Most of these are geared toward creating PDFs, but they should have read capability as well.
There is this one as well: iText
I have only played with iText before. Nothing major.
We've used Aspose with good results.
Addition to the to the approved answer: there are also alternative commercial solutions to replace Adobe IFilter for text indexing (providing the similar API but also offering additional premium functionality):
Foxit PDF IFilter: provides much faster text indexing comparing to Adobe's plugin.
PDFLib PDF iFilter: includes support for damaged PDF documents plus the additional API to run your own queries.
If you are looking for the single tool that can be used from both managed .NET apps and legacy programming languages like classic ASP or VB6 then this is where the commercial ByteScout PDF Extractor SDK would fit as it provides both .NET and ActiveX/COM API.
Disclaimer: I work for ByteScout
Docotic.Pdf library can be used to extract formatted or plain text from PDF documents.
The library can read PDF documents of any version (up to the latest published standard). Extraction of pages is also supported by the library.
Links to sample code:
How to extract text from PDF
How to extract PDF pages
Disclaimer: I work for the vendor of the library.