Why are knitr and Sweave PDF filenames numbered? - pdf

If I write a simple Rnw document containing a figure like e.g.,
\documentclass[11pt]{article}
%~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
<<setup, include=FALSE, cache=FALSE>>=
opts_chunk$set(dev = "pdf", comment = NA, fig.path = "figure/", fig.align='center', cache=FALSE, message=FALSE, background='white')
options(replace.assign=TRUE,width=85, digits = 8)
knit_hooks$set(fig=function(before, options, envir){if (before) par(mar=c(4,4,.1,.1),cex.lab=.95,cex.axis=.9,mgp=c(2,.7,0),tcl=-.3)})
#
<<prepare-data, include=FALSE>>=
#
%~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
\begin{document}
A simple plot
\begin{figure}
<< scat, echo = FALSE, fig.width = 4.5, fig.height=3>>=
plot(runif(10), runif(10), pch = 20)
#
\end{figure}
\end{document}
Why does knitr create a PDF file with the filename figure/scat-1.pdf instead of figure/scat.pdf?

The reason was explained in the v1.7 release notes. In the development version (to be v1.8), you can use fig_chunk() to obtain the figure filenames (see package NEWS). Also see a related discussion here.

Related

LaTex fails to compile PDF

Since yesterday, I get a confusing error, when trying to knit my Markdown files to PDF. I get this warning:
This is pdfTeX, Version 3.141592653-2.6-1.40.24 (TeX Live 2022) (preloaded format=pdflatex)
restricted \write18 enabled.
entering extended mode
I was unable to find any missing LaTeX packages from the error log Chapter-5-1.log.
! Extra }, or forgotten \endgroup.
\endwraptable ...kip \egroup \box \z# \fi \egroup
\WF#floatstyhook \def \wid...
l.156 \end{wraptable}
Fehler: LaTeX failed to compile Chapter-5-1.tex. See https://yihui.org/tinytex/r/#debugging for debugging tips. See Chapter-5-1.log for more info.
Ausführung angehalten
The problem seems to be the kable tables, which have been knit in prior tries just nicely. In the error log, this is printed:
Overfull \hbox (165.80571pt too wide) in paragraph at lines 142--158
[][]
[]
! Extra }, or forgotten \endgroup.
\endwraptable ...kip \egroup \box \z# \fi \egroup
\WF#floatstyhook \def \wid...
l.158 \end{wraptable}
A friend of mine tried to compile the tex file directly to PDF, but got the same error. We where not able to find a missing } or any other error in the coding. All tables and code is running without error in the markdown preview.
I get the same error, when I try to knit a document with a dummy table, so I hope that works as a reproducible example.
---
title: "Chapter 5"
author: "Moiken Hinrichs"
date: "`r format(Sys.time(), '%d.%m.%Y')`"
header-includes:
- \usepackage{caption}
- \usepackage{float}
- \captionsetup[figure]{font=footnotesize}
- \usepackage{subfig}
- \usepackage{flafter}
- \usepackage{footmisc}
output:
bookdown::pdf_document2:
extra_dependencies: ["flafter"]
number_sections: true
toc: false
latex_engine: xelatex
classoption: twoside
keep_tex: true
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(knitr)
library(kableExtra)
library(dplyr, quietly=T)
library(tidyr)
```
```{r prox}
column1 <- c("Bert", "Angela", "Nico", "Pam", "Max")
column2 <- c(33, 54, 85, 96, 57)
prox <- data.frame(column1, column2)
prox[nrow(prox)+1, ] <- c('Total', sum(prox$column2))
kable(prox, booktabs = T, linesep = "",
col.names = c("Name", "Count"),
caption = "Number of items") %>%
kable_styling(latex_options = "striped", position = "float_right") %>%
row_spec(5, hline_after = T) %>%
row_spec(6, bold = T)
```
R studio and tinytex are updated. I tried all the debugging tips and uninstalled and installed tinytex new. The same error is also happening when I try to knit on another computer. I also tried to change from xelatex to pdflatex but the error was the same.
The wrapfig package does not particular like zero width wrapfigure or -table environments with captions. Unfortunately that's the width rmarkdown seems to use by default.
The package documentation has the following to say about the topic:
[...] However, if you specify a width of zero (0pt), the actual width of the figure will determine the wrapping width. A following \caption should have the same width as the figure, but it might fail badly; it is safer to specify a width when you use a caption.
You can avoid the problem by choosing a suitable width:
---
title: "Chapter 5"
author: "Moiken Hinrichs"
date: "`r format(Sys.time(), '%d.%m.%Y')`"
header-includes:
- \usepackage{caption}
- \usepackage{float}
- \captionsetup[figure]{font=footnotesize}
- \usepackage{subfig}
- \usepackage{flafter}
- \usepackage{footmisc}
- \usepackage{lipsum}
output:
bookdown::pdf_document2:
extra_dependencies: ["flafter"]
number_sections: true
toc: false
latex_engine: xelatex
classoption: twoside
keep_tex: true
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
library(knitr)
library(kableExtra)
library(dplyr, quietly=T)
library(tidyr)
```
```{r prox}
column1 <- c("Bert", "Angela", "Nico", "Pam", "Max")
column2 <- c(33, 54, 85, 96, 57)
prox <- data.frame(column1, column2)
prox[nrow(prox)+1, ] <- c('Total', sum(prox$column2))
kable(prox, booktabs = T, linesep = "",
col.names = c("Name", "Count"),
caption = "Number of items") %>%
kable_styling(latex_options = "striped", position = "float_right", wraptable_width = "3cm") %>%
row_spec(5, hline_after = T) %>%
row_spec(6, bold = T)
```
\lipsum

Using Python 3.8, I would like to extract text from a random PDF file

I would like to import a PDF file and find the most common words.
import PyPDF2
# Open the PDF file and read the text
pdf_file = open("nita20.pdf", "rb")
pdf_reader = PyPDF2.PdfReader(pdf_file)
text = ""
for page in range(pdf_reader.pages):
text += pdf_reader.getPage(page).extractText()
I get this error:
TypeError: '_VirtualList' object cannot be interpreted as an integer
How to resolve this issue? So I can extract every word from the PDF file, thanks.
I got some deprecation warnings on your code, but this works (tested on Python 3.11, PyPDF2 version: 3.0.1)
import PyPDF2
# Open the PDF file and read the text
pdf_file = open("..\test.pdf", "rb")
pdf_reader = PyPDF2.PdfReader(pdf_file)
text = ""
i=0
print(len(pdf_reader.pages))
for page in range(len(pdf_reader.pages)):
text += pdf_reader.pages[i].extract_text()
i=i+1
print(text)

! Package pdftex.def Error - when knitting to PDF

I am able to knit to PDF for the example below:
---
title: "R Notebook"
output:
pdf_document: default
html_notebook: default
html_document:
df_print: paged
---
Table 1 example:
```{r, warning=FALSE, message=FALSE, echo=FALSE, include=FALSE, fig.pos="H"}
library(magrittr)
library(tidyverse)
library(kableExtra)
library(readxl)
library(modelsummary)
library(scales)
tmp <- mtcars
# create a list with individual variables
# remove missing and rescale
tmp_list <- lapply(tmp, na.omit)
tmp_list <- lapply(tmp_list, scale)
# create a table with `datasummary`
# add a histogram with column_spec and spec_hist
# add a boxplot with colun_spec and spec_box
emptycol = function(x) " "
final_4_table <- datasummary(mpg + cyl + disp + hp + drat + wt + qsec + vs + am + gear + carb ~ N + Mean + SD + Heading("Boxplot") * emptycol + Heading("Histogram") * emptycol, data = tmp) %>%
column_spec(column = 5, image = spec_boxplot(tmp_list)) %>%
column_spec(column = 6, image = spec_hist(tmp_list))
```
```{r finaltable, echo=FALSE}
final_4_table
```
However, I cannot knit to PDF my own code which involves more variables. My R Markdown starts by reading my excel file and then it is pretty much the same as example above:
---
title: "table1"
output:
pdf_document: default
html_document:
df_print: paged
---
Table 1
```{r prep-tableone, message=FALSE, warning=FALSE, echo=FALSE, include=FALSE, fig.pos="H"}
library(magrittr)
library(tidyverse)
library(kableExtra)
library(readxl)
library(modelsummary)
library(scales)
### set directory
setwd("/etcetc1")
## read dataset
my_dataset <- read_excel("my_dataset.xlsx")
my_dataset <- as.data.frame(my_dataset)
...
I can run this code in R script; it works just fine. I can also knit this code to HTML just fine. When trying to knit to PDF I get the following error:
output file: table1test.knit.md
! Package pdftex.def Error: File `table1test_files/figure-latex//boxplot_65c214bae9bb.pdf' not found: using draft setting.
Error: LaTeX failed to compile table1test.tex. See https://yihui.org/tinytex/r/#debugging for debugging tips. See table1test.log for more info.
In addition: Warning messages:
1: package 'ggplot2' was built under R version 4.1.1
2: package 'tibble' was built under R version 4.1.1
3: package 'tidyr' was built under R version 4.1.1
4: In in_dir(input_dir(), evaluate(code, envir = env, new_device = FALSE, :
You changed the working directory to /etcetc1 (probably via setwd()). It will be restored to /etcetc2. See the Note section in ?knitr::knit
Execution halted
Am I missing any packages? Do you know what might be happening?
I use TeXShop for LaTeX.

Python new FontManager

Hi everybody I have developed more than one years ago a qulityplot (for publications) library ..
now there is this 3 line in my base class:
font_dirs = ['/home/marco/.fonts', ]
font_files = font_manager.findSystemFonts(fontpaths=font_dirs)
font_list = font_manager.createFontList(font_files)
font_manager.fontManager.ttflist.extend(font_list)
but when I run any class for plot defined in this library (inherited from the base class whit the 4 lines above) I got this message:
The createFontList function was deprecated in Matplotlib 3.2 and will be removed two minor releases later. Use FontManager.addfont instead.
/usr/lib/python3.7/_collections_abc.py:841: MatplotlibDeprecationWarning: Support for setting the 'text.latex.preamble' or 'pgf.preamble' rcParam to a list of strings is deprecated since 3.3 and will be removed two minor releases later; set it to a single string instead.
self[key] = other[key]
Can somebody help me understand what I have to modify in order to make everything works fine
I had a similar problem with the preamble for the pgf and LATEX settings of my plots:
I had to change the following code:
pgf_with_latex = { # setup matplotlib to use latex for output
"pgf.texsystem": "pdflatex", # change this if using xetex or lautex
"text.usetex": True, # use LaTeX to write all text
"font.family": "serif",
"font.serif": [], # blank entries should cause plots
"font.sans-serif": [], # to inherit fonts from the document
"font.monospace": [],
"axes.labelsize": 10, # LaTeX default is 10pt font.
"font.size": 10,
"legend.fontsize": 8, # Make the legend/label fonts
"xtick.labelsize": 8, # a little smaller
"ytick.labelsize": 8,
"figure.figsize": figsize(0.9), # default fig size of 0.9 textwidth
"pgf.preamble": [
r"\usepackage[utf8x]{inputenc}", # use utf8 fonts
r"\usepackage[T1]{fontenc}", # plots will be generated
r"\usepackage[detect-all,locale=DE]{siunitx}",
] # using this preamble
}
to:
pgf_with_latex = { # setup matplotlib to use latex for output # change this if using xetex or lautex
"pgf.texsystem": "pdflatex",
"text.usetex": True, # use LaTeX to write all text
"font.family": "serif",
"font.serif": [], # blank entries should cause plots
"font.sans-serif": [], # to inherit fonts from the document
"font.monospace": [],
"axes.labelsize": 10, # LaTeX default is 10pt font.
"font.size": 10,
"legend.fontsize": 8, # Make the legend/label fonts
"xtick.labelsize": 8, # a little smaller
"ytick.labelsize": 8,
"pgf.preamble": r"\usepackage[detect-all,locale=DE]{siunitx} \usepackage[T1]{fontenc} \usepackage[utf8x]{inputenc}"}
You just have to rewrite your lists (here in pgf.preamble) to a single string.

How to set the "band description" option/tag of a GeoTIFF file using GDAL (gdalwarp/gdal_translate)

Does anybody know how to change or set the "Description" option/tag of a GeoTIFF file using GDAL?
To specify what I mean, this is an example of gdalinfo return from a GeoTIFF file with set "Description":
Band 1 Block=64x64 Type=UInt16, ColorInterp=Undefined
Description = AVHRR Channel 1: 0.58 micrometers -- 0.68 micrometers
Min=0.000 Max=814.000
Minimum=0.000, Maximum=814.000, Mean=113.177, StdDev=152.897
Metadata:
LAYER_TYPE=athematic
STATISTICS_MAXIMUM=814
STATISTICS_MEAN=113.17657236931
STATISTICS_MINIMUM=0
STATISTICS_STDDEV=152.89720574652
In the example you can see: Description = AVHRR Channel 1: 0.58 micrometers -- 0.68 micrometers
How do I set this parameter using GDAL?
In Python you can set the band description like this:
from osgeo import gdal, osr
import numpy
# Define output image name, size and projection info:
OutputImage = 'test.tif'
SizeX = 20
SizeY = 20
CellSize = 1
X_Min = 563220.0
Y_Max = 699110.0
N_Bands = 10
srs = osr.SpatialReference()
srs.ImportFromEPSG(2157)
srs = srs.ExportToWkt()
GeoTransform = (X_Min, CellSize, 0, Y_Max, 0, -CellSize)
# Create the output image:
Driver = gdal.GetDriverByName('GTiff')
Raster = Driver.Create(OutputImage, SizeX, SizeY, N_Bands, 2) # Datatype = 2 same as gdal.GDT_UInt16
Raster.SetProjection(srs)
Raster.SetGeoTransform(GeoTransform)
# Iterate over each band
for band in range(N_Bands):
BandNumber = band + 1
BandName = 'SomeBandName '+ str(BandNumber).zfill(3)
RasterBand = Raster.GetRasterBand(BandNumber)
RasterBand.SetNoDataValue(0)
RasterBand.SetDescription(BandName) # This sets the band name!
RasterBand.WriteArray(numpy.ones((SizeX, SizeY)))
# close the output image
Raster = None
print("Done.")
Unfortunately, I'm not sure if ArcGIS or QGIS are able to read the band descriptions. However, the band names are clearly visible in Tuiview:
GDAL includes a python application called gdal_edit.py which can be used to modify the metadata of a file in place. I am not familiar with the Description field you are referring to, but this tool should be the one to use.
Here is the man page: gdal_edit.py
Here is an example script using an ortho-image I downloaded from the USGS Earth-Explorer.
#!/bin/sh
# Image to modify
IMAGE_PATH='11skd505395.tif'
# Field to modify
IMAGE_FIELD='TIFFTAG_IMAGEDESCRIPTION'
# Print the tiff image description tag
gdalinfo $IMAGE_PATH | grep $IMAGE_FIELD
# Change the Field
CMD="gdal_edit.py -mo ${IMAGE_FIELD}='Lake-Tahoe' $IMAGE_PATH"
echo $CMD
$CMD
# Print the new field value
gdalinfo $IMAGE_PATH | grep $IMAGE_FIELD
Output
$ ./gdal-script.py
TIFFTAG_IMAGEDESCRIPTION=OrthoVista
gdal_edit.py -mo TIFFTAG_IMAGEDESCRIPTION='Lake-Tahoe' 11skd505395.tif
TIFFTAG_IMAGEDESCRIPTION='Lake-Tahoe'
Here is another link that should provide useful info.
https://gis.stackexchange.com/questions/111610/how-to-overwrite-metadata-in-a-tif-file-with-gdal
Here's a single purpose python commandline script to edit band description in place.
''' Set image band description to specified text'''
import os
import sys
from osgeo import gdal
gdal.UseExceptions()
if len(sys.argv) < 4:
print(f"Usage: {sys.argv[0]} [in_file] [band#] [text]")
sys.exit(1)
infile = sys.argv[1] # source filename and path
inband = int(sys.argv[2]) # source band number
descrip = sys.argv[3] # description text
data_in = gdal.Open(infile, gdal.GA_Update)
band_in = data_in.GetRasterBand(inband)
old_descrip = band_in.GetDescription()
band_in.SetDescription(descrip)
new_descrip = band_in.GetDescription()
# de-reference the datasets, which triggers gdal to save
data_in = None
data_out = None
print(f"Description was: {old_descrip}")
print(f"Description now: {new_descrip}")
In use:
$ python scripts\gdal-edit-band-desc.py test-edit.tif 1 "Red please"
Description was:
Description now: Red please
$ gdal-edit-band-desc test-edit.tif 1 "Red please also"
$ python t:\ENV.558\scripts\gdal-edit-band-desc.py test-edit.tif 1 "Red please also"
Description was: Red please
Description now: Red please also
Properly it should be added to gdal_edit.py but I don't know enough do feel safe adding it directly.
gdal_edit.py with the -mo flag can be used to edit the band descriptions, with the bands numbered starting from 1:
gdal_edit.py -mo BAND_1=AVHRR_Channel_1_p58_p68_um -mo BAND_2=AVHRR_Channel_2 avhrr.tif
I didn't try it with the special characters but that might work if you use the right quotes.