Quarto: PDF Document - Figure Caption size - pdf

How to customize font size of Figure Caption in a quarto pdf document? I have checked about the mainfont and fontfamily options, but the documentation doesn't provide examples of how to use change the font size for individual elements in a pdf document.

Since for pdf output, ultimately latex is used, you just need to find the corresponding latex solution to do what you want to do and incorporate those latex codes using LaTex Includes.
So to change the figure caption size, we can use the caption package. From section 2.3 of the caption package manual,
There are three font options which affect different parts of the caption: One affecting the whole caption (font), one which only affects the caption label and separator (labelfont) and at least one which only affects the caption text (textfont).
You set them up using the options font={⟨font options⟩}, labelfont={⟨font options⟩}, and textfont={⟨font options⟩}, where ⟨font options⟩ is a list of comma separated font options.
And these are the available font options:
scriptsize => Very small size
footnotesize => The size usually used for footnotes
small => Small size
normalsize => Normal size
large => Large size
Large => Even larger size
Read the manual (section 2.3) to know the details and more options.
---
title: "Figure Caption Size"
format:
pdf:
include-in-header:
text: |
\usepackage[font=Large,labelfont={bf,Large}]{caption}
---
## Quarto
```{r}
#| fig-cap: "Just a scatterplot"
plot(rnorm(1:10), rnorm(1:10))
```

Related

Change Title/Headings Font in Quarto PDF Output

When RMarkdown .rmd documents are knitted as PDF, the text body as well as the title, subtitle and headings are rendered in the same LaTeX standard font.
When rendering a Quarto .qmd document as PDF, the font for the text body remains the same, but the title, subtitle and headings are rendered in a different font, without serifs.
To achieve consistency between the outputs of older R Markdown documents and newer Quarto documents, I would like to change the font for the title, subtitle and headings back to the normal font. How can I achieve this?
I tried using fontfamily: in the YAML header, but this did not find the fonts I wanted. I had some success by using \setkomafont{section}{\normalfont} in include-in-header:, as this did change the font, but only for h1 headings, not for h2 nor for the title or subtitle. It also removed all other formatting for h1 (e.g. fontsize, bold, etc.), which is not what I want.
Using this answer from Tex StackExchange we can do this in quarto easily.
---
title: "Fonts"
subtitle: "Changing fonts of title, subtitle back to normal font"
author: "None"
format:
pdf:
include-in-header:
text: |
\addtokomafont{disposition}{\rmfamily}
---
## Quarto
Quarto enables you to weave together content and executable code into a
finished document. To learn more about Quarto see <https://quarto.org>.
## Running Code
When you click the **Render** button a document will be generated that includes
both content and the output of embedded code.
And of course, check the section 3.6 - Text Markup of KOMA-Script manual, which provides a very detailed list of elements (like author, chapter, title, subtitle, date, etc.) for whose such changes can be done.
If the font used in the body is known, then you can set the font used in title and headings with sansfont: .... It's wise to also set mainfont to make sure they are the same.
The default font used is Latin Modern Roman, so adding this to the YAML frontmatter should do it:
---
mainfont: Latin Modern Roman
sansfont: Latin Modern Roman
---

gnuplot: Hypertext or tags in PDF?

Is there any way to add hypertext or tags into PDFs via gnuplot?
According to the manual (gnuplot 5.4.0) it's not possible:
Some terminals (wxt, qt, svg, canvas, win) allow you to attach
hypertext to specific points on the graph or elsewhere on the canvas.
When the mouse hovers over the anchor point, a pop-up box containing
the text is displayed. Terminals that do not support hypertext will
display nothing.
Actually, there are 3 desires:
add hypertext into PDF, when hoovering with mouse over the point the text will appear, like in the above terminals (also known as "tool tip" or "bubble help").
add hyperlinks into PDF, when clicked on them it will be redirected to an URL, e.g. www.gnuplot.info or if possible to an other local file (with absolute or relative path).
add some tags or labels which could be used further (when this PDF is included into a LaTeX document) to link to a different chapter, section or figure.
This is probably more a question for tex.stackexchange.
Of course, you can include a gnuplot graph (PNG, PDF) into LaTeX document and then you can probably define areas on the graph for links etc. within LaTeX. However, everytime the graph changes you would have to redefine all positions in LaTeX again and again. That's why I would like to do it automatically in gnuplot.
Maybe other plotting packages can do this, e.g. pgfplots or tikz or others, but since I feel comfortable with gnuplot I wanted to avoid to use yet another package and check whether nevertheless there might be a way with gnuplot.
I'm aware that this is beyond gnuplot's focus of plotting, but maybe somebody knows about a workaround with gnuplot?
Code:
### hypertext in PDF???
reset session
set term pdfcairo size 29.7cm, 21.0cm font ",20" # A4 landscape
set output "Test.pdf"
$Data <<EOD
"Go here" 0.5 0.8
"Go there" 0.5 0.2
"Go left" 0.2 0.5
"Go right" 0.8 0.5
"www.gnuplot.info" 0.5 0.5
EOD
do for [i=1:|$Data|] {
set label i word($Data[i],1) at screen word($Data[i],2), screen word($Data[i],3) hypertext point pt 6 ps 10
}
plot cos(x)
set output
### end of code
Result: (PNG screenshot of PDF just for illustration, of course there will be no hypertext).
If using LaTeX (with set term cairolatex or epslatex or tikz in non-standalone mode) is a valid solution, I would outsource the creation of hyperlinks and hypertexts to hyperref and put the corresponding LaTeX code as a title or label in the plot:
set label '\href{http://www.gnuplot.info}{click me!}' ...
plot ... title 'see \autoref{section:methods} and read Ref.\autocite{bib:John_Smith_2002}'
Single quotes are mandatory, or else all special characters have to be escaped when using double quotes.

Struggling with PDF output of bookdown

I thought it would be a good idea to write a longer report/protocol using bookdown since it's more comfortable to have one file per topic to write in instead of just one RMarkdown document with everything. Now I'm faced with the problem of sharing this document - the HTML looks best (except for wide tables being cut off) but is difficult to send via e-mail to a supervisor for example. I also can't expect anyone to be able to open the ePub format on their computer, so PDF would be the easiest choice. Now my problems:
My chapter headings are pretty long, which doesn't matter in HTML but they don't fit the page headers in the PDF document. In LaTeX I could define a short title for that, can I do that in bookdown as well?
I include figure files using knitr::include_graphics() inside of code chunks, so I generate the caption via the chunk options. For some figures, I can't avoid having an underscore in the caption, but that does not work out in LaTeX. Is there a way to escape the underscore that actually works (preferrably for HTML and PDF at the same time)? My LaTeX output looks like this after rendering:
\textbackslash{}begin\{figure\}
\includegraphics[width=0.6\linewidth,height=0.6\textheight]{figures/0165_HMMER} \textbackslash{}caption\{Output of HMMER for PA\_0165\}\label{fig:0165}
\textbackslash{}end\{figure\}
Edit
MWE showing that the problem is an underscore in combination with out.height (or width) in percent:
---
title: "MWE FigCap"
author: "LilithElina"
date: "19 Februar 2020"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
```{r cars}
summary(cars)
```
## Including Plots
You can also embed plots, for example:
```{r pressure, echo=FALSE, fig.cap="This is a nice figure caption", out.height='40%'}
plot(pressure)
```
Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.
```{r pressure2, echo=FALSE, fig.cap="This is a not nice figure_caption", out.height='40%'}
plot(pressure)
```
Concerning shorter headings: pandoc, which is used for the markdown to LaTeX conversion, does not offer a "shorter heading". You can do that yourself, though:
# Really long chaper heading
\markboth{\thechapter~short heading}{}
[...]
## Really long section heading
\markright{\thesection~short heading}
This assumes a document class with chapters and sections.
Concerning the underscore in the figure caption: For me it works for both PDF and HTML to escape the underscore:
```{r pressure2, echo=FALSE, fig.cap="This is a not nice figure\\_caption", out.height='40%'}
plot(pressure)
```

Nbspace not available

I am using pdfbox 2.0.9
I have a pdf with acrofrom only and I want set nbspace character to a field:
field.setValue("\u00A0");
But I get error:
java.lang.IllegalArgumentException: U+00A0 ('nbspace') is not available in this font Courier encoding: WinAnsiEncoding
I understand font on current field is not supporting these character.
How can I with pdfbox2.0.14 get pdf fonts list available on my pdf?
This topic might be related How to print `Non-breaking space` to a pdf using apache pdf box?
The text fields in your PDF use the font Helv.
The AcroForm resources font Helv is defined with the following encoding:
5 0 obj
<<
/Type/Encoding
/Differences[
24/breve/caron/circumflex/dotaccent/hungarumlaut/ogonek/ring/tilde
39/quotesingle
96/grave
128/bullet/dagger/daggerdbl/ellipsis/emdash/endash/florin/fraction
/guilsinglleft/guilsinglright/minus/perthousand/quotedblbase/quotedblleft
/quotedblright/quoteleft/quoteright/quotesinglbase/trademark/fi/fl/Lslash
/OE/Scaron/Ydieresis/Zcaron/dotlessi/lslash/oe/scaron/zcaron
160/Euro
164/currency
166/brokenbar
168/dieresis/copyright/ordfeminine
172/logicalnot/.notdef/registered/macron/degree/plusminus/twosuperior
/threesuperior/acute/mu
183/periodcentered/cedilla/onesuperior/ordmasculine
188/onequarter/onehalf/threequarters
192/Agrave/Aacute/Acircumflex/Atilde/Adieresis/Aring/AE/Ccedilla
/Egrave/Eacute/Ecircumflex/Edieresis/Igrave/Iacute/Icircumflex
/Idieresis/Eth/Ntilde/Ograve/Oacute/Ocircumflex/Otilde/Odieresis
/multiply/Oslash/Ugrave/Uacute/Ucircumflex/Udieresis/Yacute/Thorn
/germandbls/agrave/aacute/acircumflex/atilde/adieresis/aring/ae
/ccedilla/egrave/eacute/ecircumflex/edieresis/igrave/iacute
/icircumflex/idieresis/eth/ntilde/ograve/oacute/ocircumflex/otilde
/odieresis/divide/oslash/ugrave/uacute/ucircumflex/udieresis/yacute
/thorn/ydieresis
]
>>
endobj
As there is no font program embedded for this font, this encoding is based on the StandardEncoding. This base encoding does not contain a non-breaking space. Furthermore your Differences array does not add nbspace either.
Thus, you cannot draw a non-breaking space using that encoding and, therefore, also not using that Helv font.
As far as I know, PDFBox does not supply replacement fonts in such a case, i.e. if asked to create a new text field appearance while setting a value which contains a character not supported in the form field default appearance font encoding.
One work-around might be to not ask PDFBox to generate an appearance to start with, instead mark the AcroForm with a NeedAppearances value true, and hope a later PDF processor / viewer does use a replacement font in such a case. There is no guarantee this works, probably the next processor needing appearances also doesn't supply replacement fonts. Nonetheless, there at least is a chance it does...
Depending on the exact version of PDFBox, though,
field.setValue(value);
may always trigger appearance generation. If that is the case for you, you have to set the field value like this
field.getCOSObject().setString(COSName.V, value);

Extract text from PDF in respect to formatting (font size, type etc)

Is possible to extract text from a PDF file concerning specific font/font size/font colour etc.? I prefer Perl, python or *nix command-line utilities. My goal is to extract all headlines from PDF file so I will have a nice index of articles contained in a single PDF.
Text and /font/font size/position (no color, as I checked) you can get from Ghostscript's txtwrite device (try -dTextFormat=0 | 1 options), as well as from mudraw's (MuPDF) with -tt option. Then parse XML-like output with e.g. Perl.
I have working code which extracts text from pdf with the size of the font.
with help of PDfminer, I have achieved this job. with many pdf's
from pdfminer.high_level import extract_pages
from pdfminer.layout import LTTextContainer, LTChar,LTLine,LAParams
import os
path=r'path\whereyour pdffile'
os.chdir(path)
Extract_Data=[]
for PDF_file in os.listdir():
if PDF_file.endswith('.pdf'):
for page_layout in extract_pages(PDF_file):
for element in page_layout:
if isinstance(element, LTTextContainer):
for text_line in element:
for character in text_line:
if isinstance(character, LTChar):
Font_size=character.size
Extract_Data.append([Font_size,(element.get_text())])
I have used fitz to accomplish the required task, as it is much faster compared to pdfminer. You can find my duplicate answer to a similar question here.
An example code snippet is shown below.
import fitz
def scrape(keyword, filePath):
results = [] # list of tuples that store the information as (text, font size, font name)
pdf = fitz.open(filePath) # filePath is a string that contains the path to the pdf
for page in pdf:
dict = page.get_text("dict")
blocks = dict["blocks"]
for block in blocks:
if "lines" in block.keys():
spans = block['lines']
for span in spans:
data = span['spans']
for lines in data:
if keyword in lines['text'].lower(): # only store font information of a specific keyword
results.append((lines['text'], lines['size'], lines['font']))
# lines['text'] -> string, lines['size'] -> font size, lines['font'] -> font name
pdf.close()
return results
If you wish to find the font information of every line, you may omit the if condition that checks for a specific keyword.
You can extract the text information in any desired format by understanding the structure of dictionary outputs that we obtain by using get_text("dict"), as mentioned in the documentation.