Automatic fitting of text content to page with rst2pdf - pdf

Does rst2pdf provide automatic adjustment of text size to fit a bullet or numbered list to the page for presentation purposes? Similar to what LaTeX provides.

No, it doesn't have size-to-fit. I use rst2pdf for presentations and I use a consistent font size throughout my slide decks, and I quite like the effect it produces. (Sorry this is probably not the answer you were hoping for!)

Related

Increase font for pdf using Inkscape

I use to produce pdf graphs with R then I like to modify them using inkscape.
Yet when I increase font, letter size increase but letter spacing don't as you can see in example.
I have the same problem when I do the same with pdf from latex.
Thank for your help
Perhaps you have broken the text into individual letters, and are applying the new font size to those, rather than to the entire word? You may need to recreate the Xlabel text/group the letters back together.
Although there is an answer for exactly the same question here, I will duplicate it:
You should select the text you want to resize and then remove manual kerning either before or after the resizing. This can be done by clicking Text -> Remove Manual Kerns.

Extract titles from each page of a PDF?

I am working on a project, SIGGRAPH Image Wall.
My first challenge is to figure out how to extract titles of each page in a PDF, SIGGRAPH 2013 Technical Papers First Pages (44 MB PDF).
This PDF is a compilation of the first page of each papers.
Therefore, there is a paper title for each page, a little different from
the traditional scholar paper.
Does anyone have any idea for this?
I think you can accomplish this using any of a number of text extraction approaches, though I will caution that getting to 100% accuracy will be tricky...
Some possible tools to use:
pdftotext or pdf2txt - Simple and easy cross-platform extraction utilities.
PDFNet - Robust SDK for digging into PDFs and pulling out exactly the data you want.
Perl modules: PDF::API2, CAM::PDF - I'm a Perl guy so I'd go this route, but I'm sure similar libraries exist in Python, Ruby, etc.
Your source pages look reasonably consistent - I feel like you'll be able to make some smart guesses about where on the page your content will be and what it'll look like. I'd try this out:
Inspect the PDF manually to figure out the title font name and size.
Extract text information for the top portion of the page (something like the top 150 pixels). Make sure to extract font info.
This should get all of your title text and maybe some author names. Parse this data (either within the script you write, or in the XML output files from pdftotext, etc.), keeping only the words that match your title font info.
If the title font varies, you'll need to guess what the title font is for each page and differentiate it from author names (the only other content you should get from the top of the page) which you can probably do simply by comparing font sizes.

Reading text + graphic (like lines) info from an existing pdf

I want to read an existing pdf & extract the text and graphics information. Within graphics, currently i just need the drawn lines. There are many vendor component for reading PDF text, but are there ones that can give graphics info too ? Though free/open-source is preferred, I'm ok to commercial ones too.
The requirement is:
For every page in PDF:
Reading text blocks
Getting to know the canvas co-ordinate of the text block (rectangle containing the block). Note, for text with higher font size, the rect size will change.
Lines - need collection of (x1,y1,x2,y2) for every line in a page in pdf
Thanks,
- Seeker
This is my field, though the question is a bit old. Hopefully this still helps.
You leave some room for assumptions, so here are mine:
you seek a script, rather than stand-alone software
your object is archival
you are running command-line scripts:
Use this command line script, detailed at: http://stefaanlippens.net/extract-images-from-pdf-documents
you are running server-side code using imagemagick or graphicsmagick functions:
Something like "convert -background white -flatten test1.pdf test1.jpg" (imagemagick) will render the whole PDF page into a jpeg. If you want to then crop it to the image(s), then it depends upon the context of the project to determine the best script(s) to do that.
A rather complex question. If you wish to provide more details about the project, then I can provide some more guidance. Best of luck.

PDF Colo(u)r Analysis (without Acrobat itself ?)

Is there a library/tool which would list all colours used in a PDF document ?
I'm sure Acrobat itself would do this but I would like an alternative (ideally something that could be scripted).
So the idea is if you have a very simple PDF document with four colours in it the output might say :
RGB(100,0,0)
RGB(105,0,0)
CMYK(0,0,0,1)
CMYK(1,1,1,1)
You could explore the insides with pdfbox, but you would have to write some code to find and catalog all those colors.
Most PDF tools have access to this information but no api to access it. You could take any tool and add it in
Apago PDFspy generates an XML file containing all kinds of metadata extracted from PDF files. It reports color usage including spot colors.
We recently added a function called GetPageColorSpaces(0) to the Quick PDF Library - www.quickpdflibrary.com to retrieve much of the ColorSpace info used in the document.
Here is some sample output.
Resource,\"QuickPDFCS2eb0f578\",Separation,\"HKS 52 E\",DeviceCMYK,0.95,0,0.55,0
Resource,\"QuickPDFCSb7b05308\",Separation,\"Black\",DeviceCMYK,0,0,0,1
Resource,\"QuickPDFCSd9f10810\",Separation,\"Pantone 117 C\",DeviceCMYK,0,0.18,1,0.15
Resource,\"QuickPDFCS9314518c\",Separation,\"All\",DeviceCMYK,0,1,0,0.5
Resource,\"QuickPDFCS333d463d\",Separation,\"noplate\",DeviceCMYK,1,0,0,0
Resource,\"QuickPDFCSb41cafc4\",Separation,\"noprint\",DeviceCMYK,0,1,0,0
Resource,\"Cs10\",DeviceN,Black,Colorant,-1,-1,-1,-1
Resource,\"Cs10\",DeviceN,P1495,Colorant,-1,-1,-1,-1
Resource,\"Cs10\",DeviceN,CalRGB,Colorant,-1,-1,-1,-1
Resource,\"Cs10\",Separation,\"P1495\",DeviceCMYK,0,0.31,0.69,0
XObject,\"R29\",Image,,DeviceRGB,-1,-1,-1,-1
Disclaimer: I work at Atalasoft.
Our product, DotImage with the PDF Reader add-on, can do this. The easiest way is to rasterize the page and then just use any of our image analysis tools to get the colors.
This example shows how to do it if you want to group similar colors -- the deployed example will only work for PNG and JPEG, but if you download the code, it's trivial to include the add-on and get PDF as well (let me know if you need help)
Source here:
http://www.atalasoft.com/cs/blogs/31appsin31days/archive/2008/05/30/color-scheme-generator.aspx
Run it here:
http://www.atalasoft.com/31apps/ColorSchemeGenerator
If you are working with specific and simple PDF documents from a constrained source then you may be able to find the colors by reading through the content stream. However this cannot be a generic solution.
For example PDF documents can contain gradients or transparency. If your document contains this type of construct then you are likely to end up with a wide range of colors rather than a specific set.
Similarly many PDF documents contain bitmapped images. Given that these will need to be interpolated to be displayed at different resolutions, the set of colors in a displayed PDF may be bigger or different to (though obviously broadly similar to) the embedded bitmap.
Similarly many PDF documents contain constructs in multiple color spaces that are rendered into different color spaces. For example a PDF might contain a DeviceRGB bitmap, a line in an ICC based CMYK color and a Lab based rectangle. The displayed version might be in sRGB for display or CMYK for print. Each of these will influence the precise set of colors.
So the only 100% valid answer is going to be related to a particular render of a PDF at a particular resolution to a particular color space. From the resultant bitmap you can determine the colors that have been used.
There are a variety of PDF libraries that will do this type of render including DotImage (referenced in another answer) and ABCpdf .NET (on which I work).

Is there a way to use custom fonts in a PDF file?

Well basically I'm finishing school in mid December so I'm just brushing up my resume and I'm wondering if there's a way to use custom fonts (in this case Calibri and Cambria) in a PDF file and make them render correctly on all computers.
Thanks in advance!
EDIT: I'm using MS Word 2007, but am open to suggestions
PDFs don't store text and fonts like other documents, they actually convert the font to vectors, that way no matter what font you use, the document displays exactly as expected. This is why searching for text inside the PDF is such a problem for 3rd party PDF Readers and why even Adobe themselves use to distribute 2 versions of Acrobat (one with text search, one without).
Another thing to keep in mind is, PDF isn't pixel exact, it's ratio exact. PDF readers generally do not use a 100% zoom level, instead most people read them at "fit to screen" or "fit to page". I point this out because I'm guessing the reason you are trying to use those new Vista/Office 2007 fonts is because of their LCD subpixel support (improves readability on LCD screens). This feature will not translate into the PDF, since the letter becomes a vector, subpixel information is lost, and even if it wasn't, becomes useless because the vector will be sized to something other than you intended at view time.
The PDF format is capable of embedding fonts, if the font has been marked embeddable by its creator. You'll have to check the software that's creating your PDF to see if it has the capability and how to enable it.
theoretically speaking, on technical side, embedding/not embedding ability, regarding the fonts, is settled with a special flag in font file (ttf or opentype or type1)
you can view this special embedding flag with any font editor program (I recommend
FontCreator (by High-logic)
http://www.high-logic.com/font-editor/fontcreator.html
with a free trial fully operative and without limitations
you can also change embedding/not embedding flag, but legally speaking, for the 99% of fonts commercially distributed, this breaks the license of font