while creating table underneath axis on a plot, is there a way to create some whitespace between the axis and the table using matplotlib? - matplotlib

I am trying to create table inside a plot right underneath the axis of the plot using matplotlib
I am using the plt.table function to do this
However, when i create the table, it's created right on top of the axis, hence overlaps with the axislabels
Is there a way to create the white space in between
the code looks something like this
for key, value in arrayToPlot.iteritems():
ax1 = fig.add_subplot(2, 2, 1)
if value["HorErr"]:
cdf = []
#calculate percentile stats for the value["HorErr"]
cdfArrayPointer[key]["HorErr"]["percentileStats"]=libMath.percentileForListofPercentiles( value["HorErr"], PERCENTILE, validPointsOnly = True )
# now calculate the cdf values
cdfArrayPointer[key]["HorErr"]["cdf"] = libMath.cdf( value["HorErr"], 2, 400, validPointsOnly = True)
for k, v in cdfArrayPointer[key]["HorErr"]["cdf"].iteritems():
cdf.append( v )
#plot the cdf value
ax1.plot(cdf, 'o-', label = ('HorErr for ' + str( key) ), color = getColour(key), markersize=3)
plt.title("CDF Plot of 2D-Horizontal Error", size = 8)
plt.ylabel('Percentile %', size = 7)
plt.xlabel('Horizontal Error [m]', size = 6)
plt.axis([0, 150, 0, 110])
leg = plt.legend(loc = 4)
setLegendSize( leg, 7)
# creating the table to be drawn on the axis
tableTexts["rows"].append(key)
tableTexts["rowColour"].append(getColour(key))
if (len(tableTexts["col"]) == 0):
tableTexts["col"] = tuple(cdfArrayPointer[key]["HorErr"]["percentileStats"].keys())
tableTexts["values"].append(cdfArrayPointer[key]["HorErr"]["percentileStats"].values())
the_table = plt.table(cellText=tableTexts["values"], rowLabels= tableTexts["rows"], rowColours= tableTexts["rowColour"] ,colLabels= tableTexts["col"], loc="bottom")

What about breaking your figure up using subplot?
You could have the axis in one subplot, and the table in another. (See example near bottom of page here)
I can illustrate further if you can't follow.

Related

geom_bar for total counts of binned continuous variable

I'm really struggling to achieve what feels like an incredibly basic geom_bar plot. I would like the sum of y to be represented by one solid bar (with colour = black outline) in bins of 10 for x. I know that stat = "identity" is what is creating the unnecessary individual blocks in each bar but can't find an alternative to achieving what is so close to my end goal. I cheated and made the below desired plot in illustrator.
I don't really want to code x as a factor for the bins as I want to keep the format of the axis ticks and text rather than having text as "0 -10", "10 -20" etc. Is there a way to do this in ggplot without the need to use summerise or cut functions on the raw data? I am also aware of geom_col and sat_count options but again, can't achive my desired outcome.
DF as below, where y = counts at various values of a continuous variable x. Also a factor variable of type.
y = c(1 ,1, 3, 2, 1, 1, 2, 1, 1, 1, 1, 1, 4, 1, 1,1, 2, 1, 2, 3, 2, 2, 1)
x = c(26.7, 28.5, 30.0, 34.8, 35.0, 36.4, 38.6, 40.0, 42.1, 43.7, 44.1, 45.0, 45.5, 47.4, 48.0, 57.2, 57.8, 64.2, 65.0, 66.7, 68.0, 74.4, 94.1)
type = c(rep("Type 1", 20), "Type 2", rep("Type 1", 2))
df<-data.frame(x,y,type)
Bar plot of total y count for each bin of x - trying to fill by total of type, but getting individual proportions as shown by line colour = black. Would like total for each type in each bar.
ggplot(df,aes(y=y, x=x))+
geom_bar(stat = "identity",color = "black", aes(fill = type))+
scale_x_binned(limits = c(20,100))+
scale_y_continuous(expand = c(0, 0), breaks = seq(0,10,2)) +
xlab("")+
ylab("Total Count")
Or trying to just have the total count within each bin but don't want the internal lines in the bars, just the outer colour = black for each bar
ggplot(df,aes(y=y, x=x))+
geom_col(fill = "#00C3C6", color = "black")+
scale_x_binned(limits = c(20,100))+
scale_y_continuous(expand = c(0, 0), breaks = seq(0,10,2)) +
xlab("")+
ylab("Total Count")
Here is one way to do it, with previous data transformation and geom_col:
df <- df |>
mutate(bins = floor(x/10) * 10) |>
group_by(bins, type) |>
summarise(y = sum(y))
ggplot(data = df,
aes(y = y,
x = bins))+
geom_col(aes(fill = type),
color = "black")+
scale_x_continuous(breaks = seq(0,100,10)) +
scale_y_continuous(expand = c(0, 0),
breaks = seq(0,10,2)) +
xlab("")+
ylab("Total Count")

ggplot2: add title changes point colors <-> scale_color_manual removes ggtitle

I am facing a silly point color in a dot plot with ggplot 2. I have a whole table of data of which i take relevant rows to make a dot plot. With scale_color_manual my points get colored according to the named palette and factor genotype specified in aes() and when i simply want to add a title specifying the cell line used, the points get colored back to automatic yellow and purple. Adding the title first and setting scale_color_manual as the last layer changes the points colors and removes the title.
What is wrong in there? I don't get it and it is a bit frustrating
thanks for your help!
Here's reproducible code to get my whole df and the subset for the plots:
# df of data to plot
exp <- c(rep(284, times = 6), rep(285, times = 12))
geno <- c(rep(rep(c("WT", "KO"), each =3), times = 6))
line <- c(rep(5, times = 6),rep(8, times= 12), rep(5, times =12), rep(8, times = 6))
ttt <- c(rep(c(0, 10, 60), times = 10), rep(c("ZAc60", "Cu60", "Cu200"), times = 2))
rep <- c(rep(1, times = 12), rep(2, times = 6), rep(c(1,2), times = 6), rep(1, times = 6))
rel_expr <- c(0.20688185, 0.21576131, 0.94046028, 0.30327675, 0.22865200,
0.92941881, 0.13787508, 0.13325281, 0.22114990, 0.95591724,
1.03239718, 0.83339248, 0.15332420, 0.17558160, 0.22475604,
1.02356351, 0.77882000, 0.69214403, 0.16874097, 0.15548158,
0.45207943, 0.28123760, 0.23500083, 0.51588856, 0.1399634,
0.14610184, 1.06716713, 0.16517801, 0.34736164, 0.64773650,
0.18334429, 0.05924757, 0.01803593, 0.86685230, 0.39554685,
0.25764805)
df_all <- data.frame(exp, geno, line, ttt, rep, rel_expr)
names(df_all) <- c("EXP", "Geno", "Line", "TTT", "Rep", "Rel_Expr")
str(df_all)
# make Geno an ordered factor
df_all$Geno <- ordered(df_all$Geno, levels = c("WT", "KO"))
# select set of whole dataset for current plot
df_ions <- df_all[df_all$Line == 8 & !df_all$TTT %in% c(10, 60),]
# add a treatment as factor columns fTTT
df_ions$fTTT <- ordered(df_ions$TTT, levels = c("0", "ZAc60", "Cu60", "Cu200"))
str(df_ions)
# plot rel_exp vs factor treatment, color points by geno
# with named color palette
library(ggplot2)
col_palette <- c("#000000", "#1356BC")
names(col_palette) <- c("WT", "KO")
plt <- ggplot(df_ions, aes(x = fTTT, y = Rel_Expr, color = Geno)) +
geom_jitter(width = 0.1)
plt # intermediate_plt_1.png
plt + scale_color_manual(values = col_palette) # intermediate_plt_2.png
plt + ggtitle("mRPTEC8") # final_plot.png
images:

Adding multiple labels to a branch in a phylogenetic tree using geom_label

I am very new to R, so I am sorry if this question is obvious. I would like to add multiple labels to branches in a phylogenetic tree, but I can only figure out how to add one label per branch. I am using the following code:
treetext = "(Japon[&&NHX:S=2],(((Al,Luteo),(Loam,(Niet,Cal))),(((Car,Bar),(Aph,Long[&&NHX:S=1],((Yam,Stig),((Zey,Semp),(A,(Hap,(This,That))))))));"
mytree <- read.nhx(textConnection(treetext))
ggtree(mytree) + geom_tiplab() +
geom_label(aes(x=branch, label=S))
I can add multiple symbols to a branch using the code below, but is so labor-intensive that I may as well do it by hand:
ggtree(mytree) +
geom_tiplab()+
geom_nodepoint(aes(subset = node == 32, x = x - .5),
size = 5, colour = "black", shape = 15) +
geom_nodepoint(aes(subset = node == 32, x = x - 2),
size = 5, colour = "gray", shape = 15)
A solution using the "ape" package would be:
library(ape)
mytree <- rtree(7) # A random tree
labels1 <- letters[1:6]
labels2 <- LETTERS[1:6]
plot(mytree)
# Setting location of label with `adj`
nodelabels(labels1, adj = c(1, -1), frame = 'none')
# We could also use `pos =` 1: below node; 3: above node
nodelabels(labels2, pos = 1, frame = 'n')
You might want to tweak the adj parameter to set the location as you desire it.
As I couldn't parse the treetext object you provided as an example (unbalanced braces), and I'm not familiar with how read.nhx() stores node labels, you might need a little R code to extract the labels elements; you can use a bare nodelabels() to plot the numbers of the nodes on the tree to be sure that your vectors are in the correct sequence.
If you wanted labels to appear on edges rather than at nodes, the function is ape::edgelabels().

Smoothing geom_ribbon

I've created a plot with geom_line and geom_ribbon (image 1) and the result is okay, but for the sake of aesthetics, I'd like the line and ribbon to be smoother. I know I can use geom_smooth for the line (image 2), but I'm not sure if it's possible to smooth the ribbon.I could create a geom_smooth line for the top and bottom lines of the ribbon (image 3), but is there anyway to fill in the space between those two lines?
A principled way to achieve what you want is to fit a GAM model to your data using the gam() function in mgcv and then apply the predict() function to that model over a finer grid of values for your predictor variable. The grid can cover the span defined by the range of observed values for your predictor variable. The R code below illustrates this process for a concrete example.
# load R packages
library(ggplot2)
library(mgcv)
# simulate some x and y data
# x = predictor; y = response
x <- seq(-10, 10, by = 1)
y <- 1 - 0.5*x - 2*x^2 + rnorm(length(x), mean = 0, sd = 20)
d <- data.frame(x,y)
# plot the simulated data
ggplot(data = d, aes(x,y)) +
geom_point(size=3)
# fit GAM model
m <- gam(y ~ s(x), data = d)
# define finer grid of predictor values
xnew <- seq(-10, 10, by = 0.1)
# apply predict() function to the fitted GAM model
# using the finer grid of x values
p <- predict(m, newdata = data.frame(x = xnew), se = TRUE)
str(p)
# plot the estimated mean values of y (fit) at given x values
# over the finer grid of x values;
# superimpose approximate 95% confidence band for the true
# mean values of y at given x values in the finer grid
g <- data.frame(x = xnew,
fit = p$fit,
lwr = p$fit - 1.96*p$se.fit,
upr = p$fit + 1.96*p$se.fit)
head(g)
theme_set(theme_bw())
ggplot(data = g, aes(x, fit)) +
geom_ribbon(aes(ymin = lwr, ymax = upr), fill = "lightblue") +
geom_line() +
geom_point(data = d, aes(x, y), shape = 1)
This same principle would apply if you were to fit a polynomial regression model to your data using the lm() function.

Storing plot objects in a list

I asked this question yesterday about storing a plot within an object. I tried implementing the first approach (aware that I did not specify that I was using qplot() in my original question) and noticed that it did not work as expected.
library(ggplot2) # add ggplot2
string = "C:/example.pdf" # Setup pdf
pdf(string,height=6,width=9)
x_range <- range(1,50) # Specify Range
# Create a list to hold the plot objects.
pltList <- list()
pltList[]
for(i in 1 : 16){
# Organise data
y = (1:50) * i * 1000 # Get y col
x = (1:50) # get x col
y = log(y) # Use natural log
# Regression
lm.0 = lm(formula = y ~ x) # make linear model
inter = summary(lm.0)$coefficients[1,1] # Get intercept
slop = summary(lm.0)$coefficients[2,1] # Get slope
# Make plot name
pltName <- paste( 'a', i, sep = '' )
# make plot object
p <- qplot(
x, y,
xlab = "Radius [km]",
ylab = "Services [log]",
xlim = x_range,
main = paste("Sample",i)
) + geom_abline(intercept = inter, slope = slop, colour = "red", size = 1)
print(p)
pltList[[pltName]] = p
}
# close the PDF file
dev.off()
I have used sample numbers in this case so the code runs if it is just copied. I did spend a few hours puzzling over this but I cannot figure out what is going wrong. It writes the first set of pdfs without problem, so I have 16 pdfs with the correct plots.
Then when I use this piece of code:
string = "C:/test_tabloid.pdf"
pdf(string, height = 11, width = 17)
grid.newpage()
pushViewport( viewport( layout = grid.layout(3, 3) ) )
vplayout <- function(x, y){viewport(layout.pos.row = x, layout.pos.col = y)}
counter = 1
# Page 1
for (i in 1:3){
for (j in 1:3){
pltName <- paste( 'a', counter, sep = '' )
print( pltList[[pltName]], vp = vplayout(i,j) )
counter = counter + 1
}
}
dev.off()
the result I get is the last linear model line (abline) on every graph, but the data does not change. When I check my list of plots, it seems that all of them become overwritten by the most recent plot (with the exception of the abline object).
A less important secondary question was how to generate a muli-page pdf with several plots on each page, but the main goal of my code was to store the plots in a list that I could access at a later date.
Ok, so if your plot command is changed to
p <- qplot(data = data.frame(x = x, y = y),
x, y,
xlab = "Radius [km]",
ylab = "Services [log]",
xlim = x_range,
ylim = c(0,10),
main = paste("Sample",i)
) + geom_abline(intercept = inter, slope = slop, colour = "red", size = 1)
then everything works as expected. Here's what I suspect is happening (although Hadley could probably clarify things). When ggplot2 "saves" the data, what it actually does is save a data frame, and the names of the parameters. So for the command as I have given it, you get
> summary(pltList[["a1"]])
data: x, y [50x2]
mapping: x = x, y = y
scales: x, y
faceting: facet_grid(. ~ ., FALSE)
-----------------------------------
geom_point:
stat_identity:
position_identity: (width = NULL, height = NULL)
mapping: group = 1
geom_abline: colour = red, size = 1
stat_abline: intercept = 2.55595281266726, slope = 0.05543539319091
position_identity: (width = NULL, height = NULL)
However, if you don't specify a data parameter in qplot, all the variables get evaluated in the current scope, because there is no attached (read: saved) data frame.
data: [0x0]
mapping: x = x, y = y
scales: x, y
faceting: facet_grid(. ~ ., FALSE)
-----------------------------------
geom_point:
stat_identity:
position_identity: (width = NULL, height = NULL)
mapping: group = 1
geom_abline: colour = red, size = 1
stat_abline: intercept = 2.55595281266726, slope = 0.05543539319091
position_identity: (width = NULL, height = NULL)
So when the plot is generated the second time around, rather than using the original values, it uses the current values of x and y.
I think you should use the data argument in qplot, i.e., store your vectors in a data frame.
See Hadley's book, Section 4.4:
The restriction on the data is simple: it must be a data frame. This is restrictive, and unlike other graphics packages in R. Lattice functions can take an optional data frame or use vectors directly from the global environment. ...
The data is stored in the plot object as a copy, not a reference. This has two
important consequences: if your data changes, the plot will not; and ggplot2 objects are entirely self-contained so that they can be save()d to disk and later load()ed and plotted without needing anything else from that session.
There is a bug in your code concerning list subscripting. It should be
pltList[[pltName]]
not
pltList[pltName]
Note:
class(pltList[1])
[1] "list"
pltList[1] is a list containing the first element of pltList.
class(pltList[[1]])
[1] "ggplot"
pltList[[1]] is the first element of pltList.
For your second question: Multi-page pdfs are easy -- see help(pdf):
onefile: logical: if true (the default) allow multiple figures in one
file. If false, generate a file with name containing the
page number for each page. Defaults to ‘TRUE’.
For your main question, I don't understand if you want to store the plot inputs in a list for later processing, or the plot outputs. If it is the latter, I am not sure that plot() returns an object you can store and retrieve.
Another suggestion regarding your second question would be to use either Sweave or Brew as they will give you complete control over how you display your multi-page pdf.
Have a look at this related question.