Include custom fonts in PDF - pdf

I have a question about generating PDFs with wkhtmltopdf. I know it's possible to use custom fonts in my html. But I think it's required that the operating system viewing the pdf has installed these fonts. Correct?
My question is whether it's possible to include these fonts in the PDF? So when the PDF is generated I can send it to a print office to print 50 copies. And they see the pdf exactly the same as I, without having these fonts installed.

This is certainly possible.
It's called "embedding a font" in pdf lingo.
Most pdf generation libraries should support this.
Pdf comes in different flavors (standards). One of the standards pdf/A is meant for long term storage (the A stands for archiving). The idea being that the document look and feel should be preserved as much as possible. In order to achieve this without depending on the operating system (and the fonts it may be shipped with), it is required that the fonts are embedded to fulfill the pdf/A standard.
https://en.wikipedia.org/wiki/PDF/A
I don't know how to do this in the library you are using. But I do know it's possible with iText.
This is a great tutorial on it, which aside from giving you more information about iText, will also illustrate the problem with custom fonts in a very visual way.
https://developers.itextpdf.com/tutorial/using-fonts-pdf-and-itext

Related

Better tiny format PDF over EPUB or MOBI?

I want to write a book. I'll do it without publishing house. I want to produce the printed paper version. I already know LaTeX, so I can make beautiful PDFs.
Since many people have an ebook-reader, I think it would be worth creating an ebook. I can produce a PDF with a very small format (not A4); but not an EPUB so easily.
Is it okay if I publish my book only in PDF or people will not buy my book because they prefer an EPUB or MOBI?
I think a lot of this depends on the audience of your book. Some things to think about:
One of the biggest differences between PDF and EPUB is that EPUB documents reflow the content based on the size of the reader screen -- so you can support multiple sizes of devices with one file. Apparently PDF files are larger, but I think this depends on how the PDF is created, and if the EPUB has embedded fonts, etc. See this SuperUser question about epub vs mobi vs pdf
Ages ago when I worked on kindle devices I found the font and layout support lacking. EPUB does let you embed your own fonts into the file, and most -- but not all -- readers will support the font and render things exactly the way you want. I suspect MOBI files are still not there yet. It sounds like you are interested in how your book renders on readers, so maybe MOBI files are not for you.
There are programs out there such as Calibre that will convert files from one format to another, for what it's worth.

Are there tutorials and examples of how to interpret PDF documents

I am using tools such as PDFBox to interpret PDF files (including text, strokes, glyphs and images) and can access the streams and dictionaries. I am not clear on how these components link together and how to interpret them. In particular I would like to know how to access fonts from the streams.
NOTE: I am not interested in tutorials on how to create PDF documents
You probably should start from reading PDF Reference. It's a huge file but you might read only relevant parts.
To understand font streams you are basically need to read about TrueType and Type1 font formats (it's not an easy reading either). PDF may contain other font types but TrueType and Type1 are probably most widely used.
Fiddling with fonts might be complicated so you will probably find it easier to use some font library as FreeType for extracting information from PDF font streams.
There are lots of good article on planetpdf.com and many PDF developers run blogs with useful generic articles. We have run a whole load on our blog (http://www.jpedal.org/PDFblog/)

Creating Thumbnail from PDF without Adobe SDK

I've been looking for ways by which I can generate Thumbnails from pdf, as shown in the explorer. But the problem is that without Adobe Pro, the free version does not expose all ihe COM interfaces. Is there any other way? please help.
Ghostscript (which is what ImageMagick uses) will generate images in a wide variety of different image formats... if you need something really obscure then use the imagemagick wrapper, otherwise, I prefer the straight dope.
If you can afford a commercial option, you could use Amyuni PDF Creator ActiveX for this task, (or .Net version if that suits your needs better). Using this product you can create jpg/png/bmp images from the first page of your PDF files with the specified resolution, and then use them as thumbnails.
Disclaimer: I am part of the development team of this product.
Here are other SO questions proposing other approaches (not involving COM):
Using ImageMagic in command line
Thumbnail of a PDF page (Java)

HowTo extract embedded OCR data from a PDF?

I have PDF-files with embedded OCR data. (So I already orcd them) So they are searchable. Now I want to extract this OCR data, because I want to put in in my tomcat6 searchserver. For doing this, I need the plain OCR data.
So my question is, is it possible to extract this embedded OCR-Data from the pdf Files?
It would be nice to get files with coordinates. But it would also be sufficient to get plaintext files.
You should be able to do this with iText or iTextsharp. iTextsharp has 0 documentation however, and a good number of the functions are not equivalent to those found in iText.
PDFSharp does not support iref streams. Those are pretty much the only comprehensive opensource solutions. If you do not mind paying, vista solutions may have something for you, they mostly handle workflow, but they have some pretty extensive pdf libraries as well.

How to convert Word and Excel documents to PDF programmatically?

We are developing a little application that given a directory with PDF files creates a unique PDF file containing all the PDF files in the directory. This is a simple task using iTextSharp. The problem appears if in the directory exist some files like Word documents, or Excel documents.
My question is, is there a way to convert word, excel documents into PDF programmatically? And even better, is this possible without having the office suite installed on the computer running the application?
Office 2007 allows for this. I have found PDFCreator to be good, the VBA is included in sample files, and have heard that CutePDF is also good. PDFCreator and CutePDF are free.
To work without Office, you would need viewers, as far as I know:
http://www.microsoft.com/downloads/details.aspx?FamilyID=c8378bf4-996c-4569-b547-75edbd03aaf0&displaylang=EN
http://www.microsoft.com/downloads/details.aspx?familyid=95E24C87-8732-48D5-8689-AB826E7B8FDF&displaylang=en
I needed to do this myself, but managed to get it done with .Net and without 3rd party tools:
MSDN: Saving Word 2007 Documents to PDF and XPS Formats
Pretty simple, about 50 lines of code. However I think you will need Word 2007 installed on the machine as well as the ability to Save As PDF
To convert Word documents to PDF, take a look at jWordConvert, a java library that can do exactly that. This will not work with the Excel files though, only with the Word files. The language is not Sharp, it's Java but you could switch to use IText (which is java) instead of ITextSharp.
You can also use a component like activePDF's DocConverter to convert a lot formats to PDF.
Use PDF maker that comes with adobe 7- 9
I just used this code Covert Doc to PDF
I'm surprised Aspose wasn't mentioned here, it's easy, simple, and reliable. Downside is that it is not free.
I've used iTextSharp in the past, it's really good, easy to install (one DLL I believe), the merge takes a bit of tindering so it's not as easy to use as Aspose, but hey, it's free so that is the best part.
TallPDF.NET (comes with a hefty price tag) allows you to serve dynamic PDF from any .NET application including ASP.NET pages and web services.
PDFEdit (free and open source) is an editor for manipulating PDF documents. It has a GUI version and a command-line interface. Scripting is used to a great extent in the editor and almost anything can be scripted. It is possible to create your own scripts or plugins.
The most common way to convert files to a pdf is to print them to a pdf printer driver. There are a number of such drivers, one that i know of that will do the job is Black Ice.
Another is to use Adobe Acrobat's SDK. from memory its very expensive.
Its been a while since i have actually done any work with converting pdf's and the landscape may have changed.