How to change colors in stat_summary() - ggplot2

I am trying to plot two columns of raw data (I have used melt to combine them into one data frame) and then add separate error bars for each. However, I want to make the raw data for each column one pair of colors and the error bars another set of colors, but I can't seem to get it to work. The plot I am getting is at the link below. I want to have different color pairs for the raw data and for the error bars. A simple reproducible example is coded below, for illustrative purposes.
dat2.m<-data.frame(obs=c(2,4,6,8,12,16,2,4,6),variable=c("raw","raw","raw","ip","raw","ip","raw","ip","ip"),value=runif(9,0,10))
c <- ggplot(dat2.m, aes(x=obs, y=value, color=variable,fill=variable,size = 0.02)) +geom_jitter(size=1.25) + scale_colour_manual(values = c("blue","Red"))
c<- c+stat_summary(fun.data="median_hilow",fun.args=(conf.int=0.95),aes(color=variable), position="dodge",geom="errorbar", size=0.5,lty=1)
print(c)
[1]: http://i.stack.imgur.com/A5KHk.jpg

For the record: I think that this is a really, really bad idea. Unless you have a use case where this is crucial, I think you should re-examine your plan.
However, you can get around it by adding a new set of variables, padded with a space at the end. You will want/need to play around with the legends, but this should work (though it is definitely ugly):
dat2.m<- data.frame(obs=c(2,4,6,8,12,16,2,4,6),variable=c("raw","raw","raw","ip","raw","ip","raw","ip","ip"),value=runif(9,0,10))
c <- ggplot(dat2.m, aes(x=obs, y=value, color=variable,fill=variable,size = 0.02)) +geom_jitter(size=1.25) + scale_colour_manual(values = c("blue","Red","green","purple"))
c<- c+stat_summary(fun.data="median_hilow",fun.args=(conf.int=0.95),aes(color=paste(variable," ")), position="dodge",geom="errorbar", size=0.5,lty=1)
print(c)

One way around this would be to use repetitive calls to geom_point and stat_summary. Use the data argument of those functions to feed subsets of your dataset into each call, and set the color attribute outside of aes(). It's repetitive and somewhat defeats the compactness of ggplot, but it'd do.
c <- ggplot(dat2.m, aes(x = obs, y = value, size = 0.02)) +
geom_jitter(data = subset(dat2.m, variable == 'raw'), color = 'blue', size=1.25) +
geom_jitter(data = subset(dat2.m, variable == 'ip'), color = 'red', size=1.25) +
stat_summary(data = subset(dat2.m, variable == 'raw'), fun.data="median_hilow", fun.args=(conf.int=0.95), color = 'pink', position="dodge",geom="errorbar", size=0.5,lty=1) +
stat_summary(data = subset(dat2.m, variable == 'ip'), fun.data="median_hilow", fun.args=(conf.int=0.95), color = 'green', position="dodge",geom="errorbar", size=0.5,lty=1)
print(c)

Related

Why do some xlims and ylims produce this error in ggplot and sf?

I am learning to use ggplot with spatial data using sf.
When I try to produce the following plot I get an error:
library(sf)
library(ggplot2)
library(rnaturalearth)
library(rnaturalearthdata)
world <- ne_countries(scale = "medium", returnclass = "sf")
ggplot(data = world) +
geom_sf() +
coord_sf(xlim = c(-102.15, -74.12), ylim = c(0, 33.97), expand = FALSE)
# Error in st_cast.POINT(x[[1]], to, ...) :
# cannot create MULTILINESTRING from POINT
However, if I just very slightly adjust the ylims in either direction, it works!:
ggplot(data = world) +
geom_sf() +
coord_sf(xlim = c(-102.15, -74.12), ylim = c(0.01, 33.97), expand = FALSE)
or
ggplot(data = world) +
geom_sf() +
coord_sf(xlim = c(-102.15, -74.12), ylim = c(-0.01, 33.97), expand = FALSE)
So that's a fine if rather hacky solution, but I am wondering what the issue is? On the final plot, I notice a little islands creeping into the map (the galapagos?)
Is the tip of this polygon somehow interfering with the first piece of code by looking like a point rather than a polygon? Or is something else going on, and how can I address is a bit more elegantly?
I don't believe that the problem you describe is caused by the Galapagos - as it is present at scale = "small" which omits these islands.
I have tried another source of the world country dataset (the one from natural earth is not spherically valid, so I wanted to rule this issue out). The error remains, and now I suspect it has something to do with the code in {ggplot2} - specifically in how it handles the equator (i.e. zero) as a limit when the default expand = TRUE is overriden by user (it might be worth filing an issue at their github).
In the meantime I suggest two possible approaches:
crop your area of interest outside of your ggplot call (this may be preferable for a variety of reasons - see ggplot2 and sf: geom_sf_text within limits set by coord_sf)
avoid the expand = FALSE in your coord_sf() call
For cropping of your world object at data level (and not at presentation level in your ggplot call) consider this code:
library(sf)
library(ggplot2)
library(rnaturalearth)
library(rnaturalearthdata)
world <- ne_countries(scale = "medium", returnclass = "sf") %>%
st_make_valid()
bounds <- matrix(c(-102.15, 0,
-74.12, 0,
-74.12, 33.97,
-102.15, 33.97,
-102.15, 0),
byrow = TRUE, ncol = 2) %>%
list() %>%
st_polygon() %>%
st_sfc(crs = 4326)
small_world <- st_intersection(world, bounds)
ggplot(data = small_world) +
geom_sf()

Adding numeric label to geom_hline in ggplot2

I have produced the graph pictured using the following code -
ggboxplot(xray50g, x = "SupplyingSite", y = "PercentPopAff",
fill = "SupplyingSite", legend = "none") +
geom_point() +
rotate_x_text(angle = 45) +
# ADD HORIZONTAL LINE AT BASE MEAN
geom_hline(yintercept = mean(xray50g$PercentPopAff), linetype = 2)
What I would like to do is label the horizontal geom_hline with it's numeric value so that it appears on the y axis.
I have provided an example of what I would like to achieve in the second image.
Could somebody please help with the code to achieve this for my plot?
Thanks!
There's a really great answer that should help you out posted here. As long as you are okay with formatting the "extra tick" to match the existing axis, the easiest solution is to just create your axis breaks manually and specify within scale_y_continuous. See below where I use an example to label a vertical dotted line on the x-axis using this method.
df <- data.frame(x=rnorm(1000, mean = 0.5))
ggplot(df, aes(x)) +
geom_histogram(binwidth = 0.1) +
geom_vline(xintercept = 0.5, linetype=2) +
scale_x_continuous(breaks=c(seq(from=-4,to=4,by=2), 0.5))
Again, for other methods, including those where you want the extra tick mark formatted differently than the rest of the axis, check the top answer here.

Issue when trying to plot geom_tile using ggplotly

I would like to plot a ggplot2 image using ggplotly
What I am trying to do is to initially plot rectangles of grey fill without any aesthetic mapping, and then in a second step to plot tiles and change colors based on aesthetics. My code is working when I use ggplot but crashes when I try to use ggplotly to transform my graph into interactive
Here is a sample code
library(ggplot2)
library(data.table)
library(plotly)
library(dplyr)
x = rep(c("1", "2", "3"), 3)
y = rep(c("K", "B","A"), each=3)
z = sample(c(NA,"A","L"), 9,replace = TRUE)
df <- data.table(x,y,z)
p<-ggplot(df)+
geom_tile(aes(x=x,y=y),width=0.9,height=0.9,fill="grey")
p<-p+geom_tile(data=filter(df,z=="A"),aes(x=x,y=y,fill=z),width=0.9,height=0.9)
p
But when I type this
ggplotly(p)
I get the following error
Error in [.data.frame(g, , c("fill_plotlyDomain", "fill")) :
undefined columns selected
The versions I use are
> packageVersion("plotly")
1 ‘4.7.1
packageVersion("ggplot2")
1 ‘2.2.1.9000’
##########Edited example for Arthur
p<-ggplot(df)+
geom_tile(aes(x=x,y=y,fill="G"),width=0.9,height=0.9)
p<- p+geom_tile(data=filter(df,z=="A"),aes(x=x,y=y,fill=z),width=0.9,height=0.9)
p<-p+ scale_fill_manual(
guide = guide_legend(title = "test",
override.aes = list(
fill =c("red","white") )
),
values = c("red","grey"),
labels=c("A",""))
p
This works
but ggplotly(p) adds the grey bar labeled G in the legend
The output of the ggplotly function is a list with the plotly class. It gets printed as Plotly graph but you can still work with it as a list. Moreover, the documentation indicates that modifying the list makes it possible to clear all or part of the legend. One only has to understand how the data is structured.
p<-ggplot(df)+
geom_tile(aes(x=x,y=y,fill=z),width=0.9,height=0.9)+
scale_fill_manual(values = c(L='grey', A='red'), na.value='grey')
p2 <- ggplotly(p)
str(p2)
The global legend is here in p2$x$layout$showlegend and setting this to false displays no legend at all.
The group-specific legend appears at each of the 9 p2$x$data elements each time in an other showlegend attribute. Only 3 of them are set to TRUE, corresponding to the 3 keys in the legend. The following loop thus clears all the undesired labels:
for(i in seq_along(p2$x$data)){
if(p2$x$data[[i]]$legendgroup!='A'){
p2$x$data[[i]]$showlegend <- FALSE
}
}
Voilà!
This works here:
ggplot(df)+
geom_tile(aes(x=x,y=y,fill=z),width=0.9,height=0.9)+
scale_fill_manual(values = c(L='grey', A='red'), na.value='grey')
ggplotly(p)
I guess your problem comes from the use of 2 different data sources, df and filter(df,z=="A"), with columns with the same name.
[Note this is not an Answer Yet]
(Putting for reference, as it is beyond the limits for comments.)
The problem is rather complicated.
I just finished debugging the code of plotly. It seems like it's occurring here.
I have opened an issue in GitHub
Here is the minimal code for the reproduction of the problem.
library(ggplot2)
set.seed(1503)
df <- data.frame(x = rep(1:3, 3),
y = rep(1:3, 3),
z = sample(c("A","B"), 9,replace = TRUE),
stringsAsFactors = F)
p1 <- ggplot(df)+
geom_tile(aes(x=x,y=y, fill="grey"), color = "black")
p2 <- ggplot(df)+
geom_tile(aes(x=x,y=y),fill="grey", color = "black")
class(plotly::ggplotly(p1))
#> [1] "plotly" "htmlwidget"
class(plotly::ggplotly(p2))
#> Error in `[.data.frame`(g, , c("fill_plotlyDomain", "fill")): undefined columns selected

Getting image.default to use class-defined Axis functions?

Compare the following:
par(mfrow = 2)
image(x=as.POSIXct(1:100, origin = "1970-1-1"), z= matrix(rnorm(100*100), 100))
plot(x=as.POSIXct(1:100, origin = "1970-1-1"), (rnorm(100)))
It seems like image (and so, image.default) fails to take the class-defined Axis functions into account when plotting, while plot does. This is problematic, since I'm in the process of implementing some classes with custom pretty and format specifications that would have their own way of plotting an axis, so I want to having my own axis functions be called when image is used, than always use the numeric version.
I understand there's a way round this by plotting axis manually, calling image first with xaxt = "n", for instance. But this seems inconvenient and messy. Ideally, I'd like a solution that can just drop in to overlay the existing function while breaking as few things as possible. Any thoughts?
The simplest way is to suppress the axes on the call to image() with axes = FALSE then add them yourself. E.g.:
set.seed(42)
X <- as.POSIXct(1:100, origin = "1970-1-1")
Z <- matrix(rnorm(100*100), 100)
image(x = X, z = Z, axes = FALSE)
axis(side = 2)
axis.POSIXct(side = 1, x = X)
box()
This can also be done using the Axis() S3 generic:
image(x = X, z = Z, axes = FALSE)
axis(side = 2)
Axis(x = X, side = 1)
box()
So to actually try to Answer the Question, I would wrap this into a function that automates the various steps:
Image <- function(x = seq(0, 1, length.out = nrow(z)),
y = seq(0, 1, length.out = ncol(z)),
z, ...) {
image(x = X, z = Z, ..., axes = FALSE)
Axis(x = y, side = 2, ...)
Axis(x = X, side = 1, ...)
box()
}
Write your axis functions as S3 methods for the Axis() generic and class x and y appropriately do that your methods are called and the above should just work. All you need to remember is to change image() to Image().
You could also write your own image() method, and add your class to x to have it called instead of image.default() Depends on whether it makes sense for x to have a class or not?
The reason I would do this is that the only way to change image.default() R-wide is to edit the function and assign it to the graphics namespace or source your version and call it explicitly. This would need to be done each and every time you started R. A custom function could easily be sourced or added to your own local package of misc functions that you arrange to load as R is starting so that it is automagically available. See ?Startup for details of how you might arrange for this.

Storing plot objects in a list

I asked this question yesterday about storing a plot within an object. I tried implementing the first approach (aware that I did not specify that I was using qplot() in my original question) and noticed that it did not work as expected.
library(ggplot2) # add ggplot2
string = "C:/example.pdf" # Setup pdf
pdf(string,height=6,width=9)
x_range <- range(1,50) # Specify Range
# Create a list to hold the plot objects.
pltList <- list()
pltList[]
for(i in 1 : 16){
# Organise data
y = (1:50) * i * 1000 # Get y col
x = (1:50) # get x col
y = log(y) # Use natural log
# Regression
lm.0 = lm(formula = y ~ x) # make linear model
inter = summary(lm.0)$coefficients[1,1] # Get intercept
slop = summary(lm.0)$coefficients[2,1] # Get slope
# Make plot name
pltName <- paste( 'a', i, sep = '' )
# make plot object
p <- qplot(
x, y,
xlab = "Radius [km]",
ylab = "Services [log]",
xlim = x_range,
main = paste("Sample",i)
) + geom_abline(intercept = inter, slope = slop, colour = "red", size = 1)
print(p)
pltList[[pltName]] = p
}
# close the PDF file
dev.off()
I have used sample numbers in this case so the code runs if it is just copied. I did spend a few hours puzzling over this but I cannot figure out what is going wrong. It writes the first set of pdfs without problem, so I have 16 pdfs with the correct plots.
Then when I use this piece of code:
string = "C:/test_tabloid.pdf"
pdf(string, height = 11, width = 17)
grid.newpage()
pushViewport( viewport( layout = grid.layout(3, 3) ) )
vplayout <- function(x, y){viewport(layout.pos.row = x, layout.pos.col = y)}
counter = 1
# Page 1
for (i in 1:3){
for (j in 1:3){
pltName <- paste( 'a', counter, sep = '' )
print( pltList[[pltName]], vp = vplayout(i,j) )
counter = counter + 1
}
}
dev.off()
the result I get is the last linear model line (abline) on every graph, but the data does not change. When I check my list of plots, it seems that all of them become overwritten by the most recent plot (with the exception of the abline object).
A less important secondary question was how to generate a muli-page pdf with several plots on each page, but the main goal of my code was to store the plots in a list that I could access at a later date.
Ok, so if your plot command is changed to
p <- qplot(data = data.frame(x = x, y = y),
x, y,
xlab = "Radius [km]",
ylab = "Services [log]",
xlim = x_range,
ylim = c(0,10),
main = paste("Sample",i)
) + geom_abline(intercept = inter, slope = slop, colour = "red", size = 1)
then everything works as expected. Here's what I suspect is happening (although Hadley could probably clarify things). When ggplot2 "saves" the data, what it actually does is save a data frame, and the names of the parameters. So for the command as I have given it, you get
> summary(pltList[["a1"]])
data: x, y [50x2]
mapping: x = x, y = y
scales: x, y
faceting: facet_grid(. ~ ., FALSE)
-----------------------------------
geom_point:
stat_identity:
position_identity: (width = NULL, height = NULL)
mapping: group = 1
geom_abline: colour = red, size = 1
stat_abline: intercept = 2.55595281266726, slope = 0.05543539319091
position_identity: (width = NULL, height = NULL)
However, if you don't specify a data parameter in qplot, all the variables get evaluated in the current scope, because there is no attached (read: saved) data frame.
data: [0x0]
mapping: x = x, y = y
scales: x, y
faceting: facet_grid(. ~ ., FALSE)
-----------------------------------
geom_point:
stat_identity:
position_identity: (width = NULL, height = NULL)
mapping: group = 1
geom_abline: colour = red, size = 1
stat_abline: intercept = 2.55595281266726, slope = 0.05543539319091
position_identity: (width = NULL, height = NULL)
So when the plot is generated the second time around, rather than using the original values, it uses the current values of x and y.
I think you should use the data argument in qplot, i.e., store your vectors in a data frame.
See Hadley's book, Section 4.4:
The restriction on the data is simple: it must be a data frame. This is restrictive, and unlike other graphics packages in R. Lattice functions can take an optional data frame or use vectors directly from the global environment. ...
The data is stored in the plot object as a copy, not a reference. This has two
important consequences: if your data changes, the plot will not; and ggplot2 objects are entirely self-contained so that they can be save()d to disk and later load()ed and plotted without needing anything else from that session.
There is a bug in your code concerning list subscripting. It should be
pltList[[pltName]]
not
pltList[pltName]
Note:
class(pltList[1])
[1] "list"
pltList[1] is a list containing the first element of pltList.
class(pltList[[1]])
[1] "ggplot"
pltList[[1]] is the first element of pltList.
For your second question: Multi-page pdfs are easy -- see help(pdf):
onefile: logical: if true (the default) allow multiple figures in one
file. If false, generate a file with name containing the
page number for each page. Defaults to ‘TRUE’.
For your main question, I don't understand if you want to store the plot inputs in a list for later processing, or the plot outputs. If it is the latter, I am not sure that plot() returns an object you can store and retrieve.
Another suggestion regarding your second question would be to use either Sweave or Brew as they will give you complete control over how you display your multi-page pdf.
Have a look at this related question.