How can I italicize part of a ggplot2 plot title when knitting the figure in an RMarkdown file? - ggplot2

If I'm just working in R to save a plot as a PNG I'm able to use the {ggtext} package to incorporate basic markdown into elements of my plots, but {ggtext} outputs garbled text when I try using element_markdown() in an R chunk.
I've also tried:
my.title <- expression(paste0(italic("Species name"), " Rest of Title"))
ggplot... + labs(title = my.title)
with no luck (when knitting).

It appears gsub() will do the trick, but I would be interested in knowing if there is a more elegant solution.

Related

Camelot in python does not behave as expected

I have two pdf documents, both in same layout with different information. The problem is:
I can read one perfectly but the other one the data is unrecognizable.
This is an example which I can read perfectly, download here:
from_pdf = camelot.read_pdf('2019_05_2.pdf', flavor='stream', strict=False)
df_pdf = from_pdf[0].df
camelot.plot(from_pdf[0], kind='text').show()
print(from_pdf[0].parsing_report)
This is the dataframe as expected:
This is an example which after I read, the information is unrecognizable, download here:
from_pdf = camelot.read_pdf('2020_04_2.pdf', flavor='stream', strict=False)
df_pdf = from_pdf[0].df
camelot.plot(from_pdf[0], kind='text').show()
print(from_pdf[0].parsing_report)
This is the dataframe with unrecognizable information:
I don't understand what I have done wrong and why the same code doesn't work for both files. I need some help, thanks.
The problem: malformed PDF
Simply, the problem is that your second PDF is malformed / corrupted. It doesn't contain correct font information, so it is impossible to extract text from your PDF as is. It is a known and difficult problem (see this question).
You can check this by trying to open the PDF with Google Docs.
Google Docs tries to extract the text and this is the result:.
Possible solutions
If you want to extract the text, you can print the document to an image-based PDF and perform an OCR text extraction.
However, Camelot does not currently support image-based PDFs, so it is not possible to extract the table.
If you have no way to recover a well-formed PDF, you could try this strategy:
print PDF to an image-based PDF
add a good text layer to your image-based PDF (using OCRmyPDF)
try using Camelot to extract tables

Struggling with PDF output of bookdown

I thought it would be a good idea to write a longer report/protocol using bookdown since it's more comfortable to have one file per topic to write in instead of just one RMarkdown document with everything. Now I'm faced with the problem of sharing this document - the HTML looks best (except for wide tables being cut off) but is difficult to send via e-mail to a supervisor for example. I also can't expect anyone to be able to open the ePub format on their computer, so PDF would be the easiest choice. Now my problems:
My chapter headings are pretty long, which doesn't matter in HTML but they don't fit the page headers in the PDF document. In LaTeX I could define a short title for that, can I do that in bookdown as well?
I include figure files using knitr::include_graphics() inside of code chunks, so I generate the caption via the chunk options. For some figures, I can't avoid having an underscore in the caption, but that does not work out in LaTeX. Is there a way to escape the underscore that actually works (preferrably for HTML and PDF at the same time)? My LaTeX output looks like this after rendering:
\textbackslash{}begin\{figure\}
\includegraphics[width=0.6\linewidth,height=0.6\textheight]{figures/0165_HMMER} \textbackslash{}caption\{Output of HMMER for PA\_0165\}\label{fig:0165}
\textbackslash{}end\{figure\}
Edit
MWE showing that the problem is an underscore in combination with out.height (or width) in percent:
---
title: "MWE FigCap"
author: "LilithElina"
date: "19 Februar 2020"
output: pdf_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## R Markdown
This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see <http://rmarkdown.rstudio.com>.
When you click the **Knit** button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
```{r cars}
summary(cars)
```
## Including Plots
You can also embed plots, for example:
```{r pressure, echo=FALSE, fig.cap="This is a nice figure caption", out.height='40%'}
plot(pressure)
```
Note that the `echo = FALSE` parameter was added to the code chunk to prevent printing of the R code that generated the plot.
```{r pressure2, echo=FALSE, fig.cap="This is a not nice figure_caption", out.height='40%'}
plot(pressure)
```
Concerning shorter headings: pandoc, which is used for the markdown to LaTeX conversion, does not offer a "shorter heading". You can do that yourself, though:
# Really long chaper heading
\markboth{\thechapter~short heading}{}
[...]
## Really long section heading
\markright{\thesection~short heading}
This assumes a document class with chapters and sections.
Concerning the underscore in the figure caption: For me it works for both PDF and HTML to escape the underscore:
```{r pressure2, echo=FALSE, fig.cap="This is a not nice figure\\_caption", out.height='40%'}
plot(pressure)
```

Knitr Spin and Rmarkdown Fig.cap (figure caption). Producing double numbering pdf document

I am referring to this Suppress automatic figure numbering in pdf output with r markdown/knitr
which I don't think was answered fully.
Essentially, I am using knitr::spin and rmarkdown to produce word, pdf and html documents.
For word, there appears to be no numbering when one puts in
+fig.1, fig.cap = "Figure name"
You only get an output Figure name in the caption.
To solve that, I used captioner class.
figs = captioner("Figure")
That works fine for word
But I am not faced with rewriting the script for pdf document as the caption turns up as figure 1: figure 1: The name
I am using knitr::spin to actually generate the RMD document for forward outputs in word and pdf.
I am not sure I can use hooks in knitr::spin, as I have tried it as advertised but can't get it to work.
I also tried
header-includes: \usepackage{caption} \usepackage{float}
\captionsetup[table]{labelformat=empty}
\captionsetup[figure]{labelformat=empty}
as suggested somehere to surpress the prefix for pdf but I get errors from pandoc. It uses pdf2latex.
I am not sure how one would query the output format in knitr::spin to actually produce different actions for different formats which could be a solution although cumbersome.
Thank you so much for your help from a novice.

R markdown to PDF preserve (leading) whitespace

I have an R markdown file that I want to convert to PDF using knitr (or sweave).
For example:
---
output: pdf_document
---
```{python}
for x in range(3):
for y in range(3):
print(x+y)
```
If I knit to PDF and copy paste the for loop part back to a text editor, the tabs are gone. This is certainly expected behaviour considering how anti whitespace-preservation markdown and pdf are, but can I still somehow preserve the actual whitespace characters when knitting from R markdown to PDF?
It is not possible, there's nothing that can be done other than making sure the document looks right, i.e. visually the whitespace is there.
It comes down to what the PDF format actually is, read the accepted answer to this question - https://superuser.com/questions/198392/how-to-copy-text-out-of-a-pdf-without-losing-formatting

SPSS tables to latex (PDF) without creating an A4 page

This may be a stupid question, but I can't figure it out.
I have made some tables in SPSS. Now I want them over to my latex document.
What I do, it that I right-click the table in SPSS, and press export.
Here I can choose between PDF or .doc. BUT the PDF-file created, generates a file with the table on top of a page (A4 size, with "page 1" at the bottom). I do not want this, I only want the table.
example how it turns out:
Example how I want it to turn out:
If I export to word, I can further save as PDF, but same problem occurs.
Screenshot works, but does not give me the same picture-quality that I prefer.
Do anyone of you have any tips for me?
Thanks :)
Unfortunately SPSS does not provide native table export to Latex. It does provide table export to html and xls, which can post-hoc be converted to Tex tables. PDF output for everything forces to export the full page (very annoying for graphics as well) - but you probably don't want to insert the image of the table (you could crop the PDF if need be), but have a Tex table (in the same font) as your document anyway.
One thing I have done in the past to make the export to text tables with specific markup is to use the PRINT or LIST commands to print the text table to the output (or to a text file) that is closer to the end goal. In this NABBLE post I have some syntax that makes pandoc flavored pipe style markdown tables - it should be pretty clear how that same approach could be used for Tex tables (actually Tex tables should be much simpler).
Here is an example of some code using LIST to make a the markup closer to Tex tables.
DATA LIST FREE / Variable (A1) Mean Median (2F4.2).
BEGIN DATA
A 3.25 2.00
B 2.56 2.50
C 9.87 10.20
END DATA.
*Using LIST to make Latex style table.
STRING Mid (A1) End (A2).
COMPUTE Mid = "&".
COMPUTE End = "//".
LIST /VARIABLES = Variable Mid Mean Mid Median End.
And here is a screen shot of the produced output on my machine.
So here I would still have to copy-paste the text output into my Tex document, (and make the header row).
You can also use OMS to save designed items in a variety of formats, including XML and then use an xml-to-Latex tool such as xmltex. You could probably even generate such a conversion with XSLT from the XML.
From the Viewer, you could also retrieve the table with Python scripting and use a Python-based converter tool.