Storing plot objects in a list - pdf

I asked this question yesterday about storing a plot within an object. I tried implementing the first approach (aware that I did not specify that I was using qplot() in my original question) and noticed that it did not work as expected.
library(ggplot2) # add ggplot2
string = "C:/example.pdf" # Setup pdf
pdf(string,height=6,width=9)
x_range <- range(1,50) # Specify Range
# Create a list to hold the plot objects.
pltList <- list()
pltList[]
for(i in 1 : 16){
# Organise data
y = (1:50) * i * 1000 # Get y col
x = (1:50) # get x col
y = log(y) # Use natural log
# Regression
lm.0 = lm(formula = y ~ x) # make linear model
inter = summary(lm.0)$coefficients[1,1] # Get intercept
slop = summary(lm.0)$coefficients[2,1] # Get slope
# Make plot name
pltName <- paste( 'a', i, sep = '' )
# make plot object
p <- qplot(
x, y,
xlab = "Radius [km]",
ylab = "Services [log]",
xlim = x_range,
main = paste("Sample",i)
) + geom_abline(intercept = inter, slope = slop, colour = "red", size = 1)
print(p)
pltList[[pltName]] = p
}
# close the PDF file
dev.off()
I have used sample numbers in this case so the code runs if it is just copied. I did spend a few hours puzzling over this but I cannot figure out what is going wrong. It writes the first set of pdfs without problem, so I have 16 pdfs with the correct plots.
Then when I use this piece of code:
string = "C:/test_tabloid.pdf"
pdf(string, height = 11, width = 17)
grid.newpage()
pushViewport( viewport( layout = grid.layout(3, 3) ) )
vplayout <- function(x, y){viewport(layout.pos.row = x, layout.pos.col = y)}
counter = 1
# Page 1
for (i in 1:3){
for (j in 1:3){
pltName <- paste( 'a', counter, sep = '' )
print( pltList[[pltName]], vp = vplayout(i,j) )
counter = counter + 1
}
}
dev.off()
the result I get is the last linear model line (abline) on every graph, but the data does not change. When I check my list of plots, it seems that all of them become overwritten by the most recent plot (with the exception of the abline object).
A less important secondary question was how to generate a muli-page pdf with several plots on each page, but the main goal of my code was to store the plots in a list that I could access at a later date.

Ok, so if your plot command is changed to
p <- qplot(data = data.frame(x = x, y = y),
x, y,
xlab = "Radius [km]",
ylab = "Services [log]",
xlim = x_range,
ylim = c(0,10),
main = paste("Sample",i)
) + geom_abline(intercept = inter, slope = slop, colour = "red", size = 1)
then everything works as expected. Here's what I suspect is happening (although Hadley could probably clarify things). When ggplot2 "saves" the data, what it actually does is save a data frame, and the names of the parameters. So for the command as I have given it, you get
> summary(pltList[["a1"]])
data: x, y [50x2]
mapping: x = x, y = y
scales: x, y
faceting: facet_grid(. ~ ., FALSE)
-----------------------------------
geom_point:
stat_identity:
position_identity: (width = NULL, height = NULL)
mapping: group = 1
geom_abline: colour = red, size = 1
stat_abline: intercept = 2.55595281266726, slope = 0.05543539319091
position_identity: (width = NULL, height = NULL)
However, if you don't specify a data parameter in qplot, all the variables get evaluated in the current scope, because there is no attached (read: saved) data frame.
data: [0x0]
mapping: x = x, y = y
scales: x, y
faceting: facet_grid(. ~ ., FALSE)
-----------------------------------
geom_point:
stat_identity:
position_identity: (width = NULL, height = NULL)
mapping: group = 1
geom_abline: colour = red, size = 1
stat_abline: intercept = 2.55595281266726, slope = 0.05543539319091
position_identity: (width = NULL, height = NULL)
So when the plot is generated the second time around, rather than using the original values, it uses the current values of x and y.

I think you should use the data argument in qplot, i.e., store your vectors in a data frame.
See Hadley's book, Section 4.4:
The restriction on the data is simple: it must be a data frame. This is restrictive, and unlike other graphics packages in R. Lattice functions can take an optional data frame or use vectors directly from the global environment. ...
The data is stored in the plot object as a copy, not a reference. This has two
important consequences: if your data changes, the plot will not; and ggplot2 objects are entirely self-contained so that they can be save()d to disk and later load()ed and plotted without needing anything else from that session.

There is a bug in your code concerning list subscripting. It should be
pltList[[pltName]]
not
pltList[pltName]
Note:
class(pltList[1])
[1] "list"
pltList[1] is a list containing the first element of pltList.
class(pltList[[1]])
[1] "ggplot"
pltList[[1]] is the first element of pltList.

For your second question: Multi-page pdfs are easy -- see help(pdf):
onefile: logical: if true (the default) allow multiple figures in one
file. If false, generate a file with name containing the
page number for each page. Defaults to ‘TRUE’.
For your main question, I don't understand if you want to store the plot inputs in a list for later processing, or the plot outputs. If it is the latter, I am not sure that plot() returns an object you can store and retrieve.

Another suggestion regarding your second question would be to use either Sweave or Brew as they will give you complete control over how you display your multi-page pdf.
Have a look at this related question.

Related

Adding multiple labels to a branch in a phylogenetic tree using geom_label

I am very new to R, so I am sorry if this question is obvious. I would like to add multiple labels to branches in a phylogenetic tree, but I can only figure out how to add one label per branch. I am using the following code:
treetext = "(Japon[&&NHX:S=2],(((Al,Luteo),(Loam,(Niet,Cal))),(((Car,Bar),(Aph,Long[&&NHX:S=1],((Yam,Stig),((Zey,Semp),(A,(Hap,(This,That))))))));"
mytree <- read.nhx(textConnection(treetext))
ggtree(mytree) + geom_tiplab() +
geom_label(aes(x=branch, label=S))
I can add multiple symbols to a branch using the code below, but is so labor-intensive that I may as well do it by hand:
ggtree(mytree) +
geom_tiplab()+
geom_nodepoint(aes(subset = node == 32, x = x - .5),
size = 5, colour = "black", shape = 15) +
geom_nodepoint(aes(subset = node == 32, x = x - 2),
size = 5, colour = "gray", shape = 15)
A solution using the "ape" package would be:
library(ape)
mytree <- rtree(7) # A random tree
labels1 <- letters[1:6]
labels2 <- LETTERS[1:6]
plot(mytree)
# Setting location of label with `adj`
nodelabels(labels1, adj = c(1, -1), frame = 'none')
# We could also use `pos =` 1: below node; 3: above node
nodelabels(labels2, pos = 1, frame = 'n')
You might want to tweak the adj parameter to set the location as you desire it.
As I couldn't parse the treetext object you provided as an example (unbalanced braces), and I'm not familiar with how read.nhx() stores node labels, you might need a little R code to extract the labels elements; you can use a bare nodelabels() to plot the numbers of the nodes on the tree to be sure that your vectors are in the correct sequence.
If you wanted labels to appear on edges rather than at nodes, the function is ape::edgelabels().

Smoothing geom_ribbon

I've created a plot with geom_line and geom_ribbon (image 1) and the result is okay, but for the sake of aesthetics, I'd like the line and ribbon to be smoother. I know I can use geom_smooth for the line (image 2), but I'm not sure if it's possible to smooth the ribbon.I could create a geom_smooth line for the top and bottom lines of the ribbon (image 3), but is there anyway to fill in the space between those two lines?
A principled way to achieve what you want is to fit a GAM model to your data using the gam() function in mgcv and then apply the predict() function to that model over a finer grid of values for your predictor variable. The grid can cover the span defined by the range of observed values for your predictor variable. The R code below illustrates this process for a concrete example.
# load R packages
library(ggplot2)
library(mgcv)
# simulate some x and y data
# x = predictor; y = response
x <- seq(-10, 10, by = 1)
y <- 1 - 0.5*x - 2*x^2 + rnorm(length(x), mean = 0, sd = 20)
d <- data.frame(x,y)
# plot the simulated data
ggplot(data = d, aes(x,y)) +
geom_point(size=3)
# fit GAM model
m <- gam(y ~ s(x), data = d)
# define finer grid of predictor values
xnew <- seq(-10, 10, by = 0.1)
# apply predict() function to the fitted GAM model
# using the finer grid of x values
p <- predict(m, newdata = data.frame(x = xnew), se = TRUE)
str(p)
# plot the estimated mean values of y (fit) at given x values
# over the finer grid of x values;
# superimpose approximate 95% confidence band for the true
# mean values of y at given x values in the finer grid
g <- data.frame(x = xnew,
fit = p$fit,
lwr = p$fit - 1.96*p$se.fit,
upr = p$fit + 1.96*p$se.fit)
head(g)
theme_set(theme_bw())
ggplot(data = g, aes(x, fit)) +
geom_ribbon(aes(ymin = lwr, ymax = upr), fill = "lightblue") +
geom_line() +
geom_point(data = d, aes(x, y), shape = 1)
This same principle would apply if you were to fit a polynomial regression model to your data using the lm() function.

Visualising an individual 2d graph for all points on a plane

I have a M vs N curve (let's take it to be a sigmoid, for ease of understanding) for a given value of parameters P and Q. I need to visualise the M vs N curves for a range of values of P and Q (assume 10 values in 0 to 1, i.e. 0.1, 0.2, ..., 0.9 for both P and Q)
The only solution that I've found for this problem is a Trellis plot (essentially a matrix of plots). I'd like to know if there any other method to visualise this sort of a 4d(?) relationship besides the Trellis plots. Thanks.
I'm not sure I understand what you're hoping for, so let me know if this is on the right track. Below are three examples using R.
The first is indeed a matrix of plots where each panel represents a different value of q and, within each panel, each curve represents a different value of p. The second is a 3D plot which looks at a surface based on three of the variables with the fourth fixed. The third is a Shiny app that creates the same interactive plot as in the second example but also provides a slider that allows you to change p and see how the plot changes. Unfortunately, I'm not sure how to embed the interactive plots in Stackoverflow so I've just provided the code.
I'm not sure if there's an elegant way to look at all four variables at the same time, but maybe someone will come along with additional options.
Matrix of plots for various values of p and q
library(tidyverse)
theme_set(theme_classic())
# Function to plot
my_fun = function(x, p, q) {
1/(1 + exp(p + q*x))
}
# Parameters
params = expand.grid(p=seq(-2,2,length=6), q=seq(-1,1,length=11))
# x-values to feed to my_fun
x = seq(-10,10,0.1)
# Generate data frame for plotting
dat = map2_df(params$p, params$q, function(p, q) {
data.frame(p=p, q=q, x, y=my_fun(x, p, q))
})
ggplot(dat, aes(x,y,colour=p, group=p)) +
geom_line() +
facet_grid(. ~ q, labeller=label_both) +
labs(colour="p") +
scale_colour_gradient(low="red", high="blue") +
theme(legend.position="bottom")
3D plot with one variable fixed
The code below will produce an interactive 3D plot that you can zoom and rotate. I've fixed the value of p and drawn a plot of the y surface for a grid of x and q values.
library(rgl)
x = seq(-10,10,0.1)
q = seq(-1,1,0.01)
y = outer(x, q, function(a, b) 1/(1 + exp(1 + b*a)))
persp3d(x, q, y, col=hcl(240,80,65), specular="grey20",
xlab = "x", ylab = "q", zlab = "y")
I'm not sure how to embed the interactive plot, but here's a static image of one viewing angle:
Shiny app
The code below will create the same plot as above, but with the added ability to vary p with a slider and see how the plot changes.
Open an R script file and paste in the code below. Save it as app.r in its own directory then run the code. Both an rgl window and the Shiny app page with the slider for controlling the value of p should open. Resize the windows as desired and then move the slider to see how the function surface changes for various values of p.
library(shiny)
# Define UI for application that draws an interactive plot
ui <- fluidPage(
# Application title
titlePanel("Plot the function 1/(1 + exp(p + q*x))"),
# Sidebar with a slider input for number of bins
sidebarLayout(
sidebarPanel(
sliderInput("p",
"Vary the value of p and see how the plot changes",
min = -2,
max = 2,
value = 1,
step=0.2)
),
# Show a plot of the generated distribution
mainPanel(
plotOutput("distPlot")
)
)
)
# Define server logic required to draw the plot
server <- function(input, output) {
output$distPlot <- renderPlot({
library(rgl)
x = seq(-10,10,0.1)
q = seq(-1,1,0.01)
y = outer(x, q, function(a, b) 1/(1 + exp(input$p + b*a)))
persp3d(x, q, y, col=hcl(240,50,65), specular="grey20",
xlab = "x", ylab = "q", zlab = "y")
})
}
# Run the application
shinyApp(ui = ui, server = server)

Getting image.default to use class-defined Axis functions?

Compare the following:
par(mfrow = 2)
image(x=as.POSIXct(1:100, origin = "1970-1-1"), z= matrix(rnorm(100*100), 100))
plot(x=as.POSIXct(1:100, origin = "1970-1-1"), (rnorm(100)))
It seems like image (and so, image.default) fails to take the class-defined Axis functions into account when plotting, while plot does. This is problematic, since I'm in the process of implementing some classes with custom pretty and format specifications that would have their own way of plotting an axis, so I want to having my own axis functions be called when image is used, than always use the numeric version.
I understand there's a way round this by plotting axis manually, calling image first with xaxt = "n", for instance. But this seems inconvenient and messy. Ideally, I'd like a solution that can just drop in to overlay the existing function while breaking as few things as possible. Any thoughts?
The simplest way is to suppress the axes on the call to image() with axes = FALSE then add them yourself. E.g.:
set.seed(42)
X <- as.POSIXct(1:100, origin = "1970-1-1")
Z <- matrix(rnorm(100*100), 100)
image(x = X, z = Z, axes = FALSE)
axis(side = 2)
axis.POSIXct(side = 1, x = X)
box()
This can also be done using the Axis() S3 generic:
image(x = X, z = Z, axes = FALSE)
axis(side = 2)
Axis(x = X, side = 1)
box()
So to actually try to Answer the Question, I would wrap this into a function that automates the various steps:
Image <- function(x = seq(0, 1, length.out = nrow(z)),
y = seq(0, 1, length.out = ncol(z)),
z, ...) {
image(x = X, z = Z, ..., axes = FALSE)
Axis(x = y, side = 2, ...)
Axis(x = X, side = 1, ...)
box()
}
Write your axis functions as S3 methods for the Axis() generic and class x and y appropriately do that your methods are called and the above should just work. All you need to remember is to change image() to Image().
You could also write your own image() method, and add your class to x to have it called instead of image.default() Depends on whether it makes sense for x to have a class or not?
The reason I would do this is that the only way to change image.default() R-wide is to edit the function and assign it to the graphics namespace or source your version and call it explicitly. This would need to be done each and every time you started R. A custom function could easily be sourced or added to your own local package of misc functions that you arrange to load as R is starting so that it is automagically available. See ?Startup for details of how you might arrange for this.

Error when generating pdf using script in R

I'm using R to loop through the columns of a data frame and make a graph of the resulting analysis. I don't get any errors when the script runs, but it generates a pdf that cannot be opened.
If I run the content of the script, it works fine. I wondered if there is a problem with how quickly it is looping through, so I tried to force it to pause. This did not seem to make a difference. I'm interested in any suggestions that people have, and I'm also quite new to R so suggestions as to how I can improve the approach are welcome too. Thanks.
for (i in 2:22) {
# Organise data
pop_den_z = subset(pop_den, pop_den[i] != "0") # Remove zeros
y = pop_den_z[,i] # Get y col
x = pop_den_z[,1] # get x col
y = log(y) # Log transform
# Regression
lm.0 = lm(formula = y ~ x) # make linear model
inter = summary(lm.0)$coefficients[1,1] # Get intercept
slop = summary(lm.0)$coefficients[2,1] # Get slope
# Write to File
a = c(i, inter, slop)
write(a, file = "C:/pop_den_coef.txt", ncolumns = 3, append = TRUE, sep = ",")
## Setup pdf
string = paste("C:/LEED/results/Images/R_graphs/Pop_den", paste(i-2), "City.pdf")
pdf(string, height = 6, width = 9)
p <- qplot(
x, y,
xlab = "Radius [km]",
ylab = "Population Density [log(people/km)]",
xlim = x_range,
main = "Analysis of Cities"
)
# geom_abline(intercept,slope)
p + geom_abline(intercept = inter, slope = slop, colour = "red", size = 1)
Sys.sleep(5)
### close the PDF file
dev.off()
}
The line should be
print(p + geom_abline(intercept = inter, slope = slop, colour = "red", size = 1))
In pdf devices, ggplot (and lattice) only writes to file when explicitly printed.