Retrieve yerr value from bar object in matplotlib - matplotlib

How can I retrieve a yerr value from an ax.bar object?
A bar chart is created with a single line, each parameter of the ax.bar() is a collection, including the yerr value.
bar_list = ax.bar(x_value_list, y_value_list, color=color_list,
tick_label=columns, yerr=confid_95_list, align='center')
Later on, I want to be able to retrieve both the y value as well as the yerr value of each individual bar in the chart.
I iterate through the bar_list collection and I can retrieve the y value, but I don't know how to retrieve the yerr value.
Getting the y value looks like this:
for bar in bar_list:
y_val = bar.get_height()
How can I get the yerr? Is there something like a bar.get_yerr() method? (It isn't bar.get_yerr())
I would like to be able to:
for bar in bar_list:
y_err = bar.get_yerr()

Note that in the above example confid_95_list is already the list of errors. So there is no need to obtain them from the plot.
To answer the question: In the line for bar in bar_list, bar is a Rectangle and thus has no errorbar associated to it.
However bar_list is a bar container with an attribute errorbar, which contains the return of the errorbar creation. You may then get the individual segments of the line collection. Each line goes from yminus = y - y_error to yplus = y + y_error; the line collection only stores the points yminus, yplus. As an example:
means = (20, 35)
std = (2, 4)
ind = np.arange(len(means))
p = plt.bar(ind, means, width=0.35, color='#d62728', yerr=std)
lc = [i for i in p.errorbar.get_children() if i is not None][0]
for yerr in lc.get_segments():
print (yerr[:,1]) # print start and end point
print (yerr[1,1]- yerr[:,1].mean()) # print error
will print
[ 18. 22.]
2.0
[ 31. 39.]
4.0
So this works well for symmectric errorbars. For asymmectric errorbars, you would additionally need to take the point itself into account.
means = (20, 35)
std = [(2,4),(5,3)]
ind = np.arange(len(means))
p = plt.bar(ind, means, width=0.35, color='#d62728', yerr=std)
lc = [i for i in p.errorbar.get_children() if i is not None][0]
for point, yerr in zip(p, lc.get_segments()):
print (yerr[:,1]) # print start and end point
print (yerr[:,1]- point.get_height()) # print error
will print
[ 18. 25.]
[-2. 5.]
[ 31. 38.]
[-4. 3.]
At the end this seems unnecessarily complicated because you only retrieve the values that you initially put in, means and std and you could simply use those values for whatever you want to do.

Related

Create line network from closest points with boundaries

I have a set of points and I want to create line / road network from those points. Firstly, I need to determine the closest point from each of the points. For that, I used the KD Tree and developed a code like this:
def closestPoint(source, X = None, Y = None):
df = pd.DataFrame(source).copy(deep = True) #Ensure source is a dataframe, working on a copy to keep the datasource
if(X is None and Y is None):
raise ValueError ("Please specify coordinate")
elif(not X in df.keys() and not Y in df.keys()):
raise ValueError ("X and/or Y is/are not in column names")
else:
df["coord"] = tuple(zip(df[X],df[Y])) #create a coordinate
if (df["coord"].duplicated):
uniq = df.drop_duplicates("coord")["coord"]
uniqval = list(uniq.get_values())
dupl = df[df["coord"].duplicated()]["coord"]
duplval = list(dupl.get_values())
for kq,vq in uniq.items():
clstu = spatial.KDTree(uniqval).query(vq, k = 3)[1]
df.at[kq,"coord"] = [vq,uniqval[clstu[1]]]
if([uniqval[clstu[1]],vq] in list(df["coord"]) ):
df.at[kq,"coord"] = [vq,uniqval[clstu[2]]]
for kd,vd in dupl.items():
clstd = spatial.KDTree(duplval).query(vd,k = 1)[1]
df.at[kd,"coord"] = [vd,duplval[clstd]]
else:
val = df["coord"].get_values()
for k,v in df["coord"].items():
clst = spatial.KDTree(val).query(vd, k = 3)[1]
df.at[k,"coord"] = [v,val[clst[1]]]
if([val[clst[1]],v] in list (df["coord"])):
df.at[k,"coord"] = [v,val[clst[2]]]
return df["coord"]
The code can return the the closest points around. However, I need to ensure that no double lines are created (e.g (x,y) to (x1,y1) and (x1,y1) to (x,y)) and also I need to ensure that each point can only be used as a starting point of a line and an end point of a line despite the point being the closest one to the other points.
Below is the visualization of the result:
Result of the code
What I want:
What I want
I've also tried to separate the origin and target coordinate and do it like this:
df["coord"] = tuple(zip(df[X],df[Y])) #create a coordinate
df["target"] = "" #create a column for target points
count = 2 # create a count iteration
if (df["coord"].duplicated):
uniq = df.drop_duplicates("coord")["coord"]
uniqval = list(uniq.get_values())
for kq,vq in uniq.items():
clstu = spatial.KDTree(uniqval).query(vq, k = count)[1]
while not vq in (list(df["target"]) and list(df["coord"])):
clstu = spatial.KDTree(uniqval).query(vq, k = count)[1]
df.set_value(kq, "target", uniqval[clstu[count-1]])
else:
count += 1
clstu = spatial.KDTree(uniqval).query(vq, k = count)[1]
df.set_value(kq, "target", uniqval[clstu[count-1]])
but this return an error
IndexError: list index out of range
Can anyone help me with this? Many thanks!
Answering now about the global strategy, here is what I would do (rough pseudo-algorithm):
current_point = one starting point in uniqval
while (uniqval not empty)
construct KDTree from uniqval and use it for next line
next_point = point in uniqval closest to current_point
record next_point as target for current_point
remove current_point from uniqval
current_point = next_point
What you will obtain is a linear graph joining all your points, using closest neighbors "in some way". I don't know if it will fit your needs. You would also obtain a linear graph by taking next_point at random...
It is hard to comment on your global strategy without further detail about the kind of road network your want to obtain. So let me just comment your specific code and explain why the "out of range" error happens. I hope this can help.
First, are you aware that (list_a and list_b) will return list_a if it is empty, else list_b? Second, isn't the condition (vq in list(df["coord"]) always True? If yes, then your while loop is just always executing the else statement, and at the last iteration of the for loop, (count-1) will be greater than the total number of (unique) points. Hence your KDTree query does not return enough points and clstu[count-1] is out of range.

Visualising an individual 2d graph for all points on a plane

I have a M vs N curve (let's take it to be a sigmoid, for ease of understanding) for a given value of parameters P and Q. I need to visualise the M vs N curves for a range of values of P and Q (assume 10 values in 0 to 1, i.e. 0.1, 0.2, ..., 0.9 for both P and Q)
The only solution that I've found for this problem is a Trellis plot (essentially a matrix of plots). I'd like to know if there any other method to visualise this sort of a 4d(?) relationship besides the Trellis plots. Thanks.
I'm not sure I understand what you're hoping for, so let me know if this is on the right track. Below are three examples using R.
The first is indeed a matrix of plots where each panel represents a different value of q and, within each panel, each curve represents a different value of p. The second is a 3D plot which looks at a surface based on three of the variables with the fourth fixed. The third is a Shiny app that creates the same interactive plot as in the second example but also provides a slider that allows you to change p and see how the plot changes. Unfortunately, I'm not sure how to embed the interactive plots in Stackoverflow so I've just provided the code.
I'm not sure if there's an elegant way to look at all four variables at the same time, but maybe someone will come along with additional options.
Matrix of plots for various values of p and q
library(tidyverse)
theme_set(theme_classic())
# Function to plot
my_fun = function(x, p, q) {
1/(1 + exp(p + q*x))
}
# Parameters
params = expand.grid(p=seq(-2,2,length=6), q=seq(-1,1,length=11))
# x-values to feed to my_fun
x = seq(-10,10,0.1)
# Generate data frame for plotting
dat = map2_df(params$p, params$q, function(p, q) {
data.frame(p=p, q=q, x, y=my_fun(x, p, q))
})
ggplot(dat, aes(x,y,colour=p, group=p)) +
geom_line() +
facet_grid(. ~ q, labeller=label_both) +
labs(colour="p") +
scale_colour_gradient(low="red", high="blue") +
theme(legend.position="bottom")
3D plot with one variable fixed
The code below will produce an interactive 3D plot that you can zoom and rotate. I've fixed the value of p and drawn a plot of the y surface for a grid of x and q values.
library(rgl)
x = seq(-10,10,0.1)
q = seq(-1,1,0.01)
y = outer(x, q, function(a, b) 1/(1 + exp(1 + b*a)))
persp3d(x, q, y, col=hcl(240,80,65), specular="grey20",
xlab = "x", ylab = "q", zlab = "y")
I'm not sure how to embed the interactive plot, but here's a static image of one viewing angle:
Shiny app
The code below will create the same plot as above, but with the added ability to vary p with a slider and see how the plot changes.
Open an R script file and paste in the code below. Save it as app.r in its own directory then run the code. Both an rgl window and the Shiny app page with the slider for controlling the value of p should open. Resize the windows as desired and then move the slider to see how the function surface changes for various values of p.
library(shiny)
# Define UI for application that draws an interactive plot
ui <- fluidPage(
# Application title
titlePanel("Plot the function 1/(1 + exp(p + q*x))"),
# Sidebar with a slider input for number of bins
sidebarLayout(
sidebarPanel(
sliderInput("p",
"Vary the value of p and see how the plot changes",
min = -2,
max = 2,
value = 1,
step=0.2)
),
# Show a plot of the generated distribution
mainPanel(
plotOutput("distPlot")
)
)
)
# Define server logic required to draw the plot
server <- function(input, output) {
output$distPlot <- renderPlot({
library(rgl)
x = seq(-10,10,0.1)
q = seq(-1,1,0.01)
y = outer(x, q, function(a, b) 1/(1 + exp(input$p + b*a)))
persp3d(x, q, y, col=hcl(240,50,65), specular="grey20",
xlab = "x", ylab = "q", zlab = "y")
})
}
# Run the application
shinyApp(ui = ui, server = server)

Issue when trying to plot geom_tile using ggplotly

I would like to plot a ggplot2 image using ggplotly
What I am trying to do is to initially plot rectangles of grey fill without any aesthetic mapping, and then in a second step to plot tiles and change colors based on aesthetics. My code is working when I use ggplot but crashes when I try to use ggplotly to transform my graph into interactive
Here is a sample code
library(ggplot2)
library(data.table)
library(plotly)
library(dplyr)
x = rep(c("1", "2", "3"), 3)
y = rep(c("K", "B","A"), each=3)
z = sample(c(NA,"A","L"), 9,replace = TRUE)
df <- data.table(x,y,z)
p<-ggplot(df)+
geom_tile(aes(x=x,y=y),width=0.9,height=0.9,fill="grey")
p<-p+geom_tile(data=filter(df,z=="A"),aes(x=x,y=y,fill=z),width=0.9,height=0.9)
p
But when I type this
ggplotly(p)
I get the following error
Error in [.data.frame(g, , c("fill_plotlyDomain", "fill")) :
undefined columns selected
The versions I use are
> packageVersion("plotly")
1 ‘4.7.1
packageVersion("ggplot2")
1 ‘2.2.1.9000’
##########Edited example for Arthur
p<-ggplot(df)+
geom_tile(aes(x=x,y=y,fill="G"),width=0.9,height=0.9)
p<- p+geom_tile(data=filter(df,z=="A"),aes(x=x,y=y,fill=z),width=0.9,height=0.9)
p<-p+ scale_fill_manual(
guide = guide_legend(title = "test",
override.aes = list(
fill =c("red","white") )
),
values = c("red","grey"),
labels=c("A",""))
p
This works
but ggplotly(p) adds the grey bar labeled G in the legend
The output of the ggplotly function is a list with the plotly class. It gets printed as Plotly graph but you can still work with it as a list. Moreover, the documentation indicates that modifying the list makes it possible to clear all or part of the legend. One only has to understand how the data is structured.
p<-ggplot(df)+
geom_tile(aes(x=x,y=y,fill=z),width=0.9,height=0.9)+
scale_fill_manual(values = c(L='grey', A='red'), na.value='grey')
p2 <- ggplotly(p)
str(p2)
The global legend is here in p2$x$layout$showlegend and setting this to false displays no legend at all.
The group-specific legend appears at each of the 9 p2$x$data elements each time in an other showlegend attribute. Only 3 of them are set to TRUE, corresponding to the 3 keys in the legend. The following loop thus clears all the undesired labels:
for(i in seq_along(p2$x$data)){
if(p2$x$data[[i]]$legendgroup!='A'){
p2$x$data[[i]]$showlegend <- FALSE
}
}
Voilà!
This works here:
ggplot(df)+
geom_tile(aes(x=x,y=y,fill=z),width=0.9,height=0.9)+
scale_fill_manual(values = c(L='grey', A='red'), na.value='grey')
ggplotly(p)
I guess your problem comes from the use of 2 different data sources, df and filter(df,z=="A"), with columns with the same name.
[Note this is not an Answer Yet]
(Putting for reference, as it is beyond the limits for comments.)
The problem is rather complicated.
I just finished debugging the code of plotly. It seems like it's occurring here.
I have opened an issue in GitHub
Here is the minimal code for the reproduction of the problem.
library(ggplot2)
set.seed(1503)
df <- data.frame(x = rep(1:3, 3),
y = rep(1:3, 3),
z = sample(c("A","B"), 9,replace = TRUE),
stringsAsFactors = F)
p1 <- ggplot(df)+
geom_tile(aes(x=x,y=y, fill="grey"), color = "black")
p2 <- ggplot(df)+
geom_tile(aes(x=x,y=y),fill="grey", color = "black")
class(plotly::ggplotly(p1))
#> [1] "plotly" "htmlwidget"
class(plotly::ggplotly(p2))
#> Error in `[.data.frame`(g, , c("fill_plotlyDomain", "fill")): undefined columns selected

bar plot - annotate the bars with some values

Immediately after creating a bar plot using pandas dataframe.plot function, I am trying to annotate the bars with some values that I have in a list. I have put the annotate command in a for loop. But, as soon as I run this piece of code, my ipython notebook stops working and it crashes.
When I remove the annotation part, the bar plot works fine. What could be the reason for this?
req_index = df.index[~df.index.isin(['99'])]
ax = df.ix[req_index,'count'].plot(kind="barh", figsize=(24,20), \
color = (0.10588235294117647, 0.6196078431372549, 0.4666666666666667))
y_values = ax.get_yticks().astype('int')
for i,indx in enumerate(req_index):
label_text = str(round(df.ix[indx,'percentage'], 4))
print label_text
x = df.ix[indx,'count']
y = y_values[i]
ax.annotate(label_text, xy = (x + 70000,y-3), size = 20)
#break

Storing plot objects in a list

I asked this question yesterday about storing a plot within an object. I tried implementing the first approach (aware that I did not specify that I was using qplot() in my original question) and noticed that it did not work as expected.
library(ggplot2) # add ggplot2
string = "C:/example.pdf" # Setup pdf
pdf(string,height=6,width=9)
x_range <- range(1,50) # Specify Range
# Create a list to hold the plot objects.
pltList <- list()
pltList[]
for(i in 1 : 16){
# Organise data
y = (1:50) * i * 1000 # Get y col
x = (1:50) # get x col
y = log(y) # Use natural log
# Regression
lm.0 = lm(formula = y ~ x) # make linear model
inter = summary(lm.0)$coefficients[1,1] # Get intercept
slop = summary(lm.0)$coefficients[2,1] # Get slope
# Make plot name
pltName <- paste( 'a', i, sep = '' )
# make plot object
p <- qplot(
x, y,
xlab = "Radius [km]",
ylab = "Services [log]",
xlim = x_range,
main = paste("Sample",i)
) + geom_abline(intercept = inter, slope = slop, colour = "red", size = 1)
print(p)
pltList[[pltName]] = p
}
# close the PDF file
dev.off()
I have used sample numbers in this case so the code runs if it is just copied. I did spend a few hours puzzling over this but I cannot figure out what is going wrong. It writes the first set of pdfs without problem, so I have 16 pdfs with the correct plots.
Then when I use this piece of code:
string = "C:/test_tabloid.pdf"
pdf(string, height = 11, width = 17)
grid.newpage()
pushViewport( viewport( layout = grid.layout(3, 3) ) )
vplayout <- function(x, y){viewport(layout.pos.row = x, layout.pos.col = y)}
counter = 1
# Page 1
for (i in 1:3){
for (j in 1:3){
pltName <- paste( 'a', counter, sep = '' )
print( pltList[[pltName]], vp = vplayout(i,j) )
counter = counter + 1
}
}
dev.off()
the result I get is the last linear model line (abline) on every graph, but the data does not change. When I check my list of plots, it seems that all of them become overwritten by the most recent plot (with the exception of the abline object).
A less important secondary question was how to generate a muli-page pdf with several plots on each page, but the main goal of my code was to store the plots in a list that I could access at a later date.
Ok, so if your plot command is changed to
p <- qplot(data = data.frame(x = x, y = y),
x, y,
xlab = "Radius [km]",
ylab = "Services [log]",
xlim = x_range,
ylim = c(0,10),
main = paste("Sample",i)
) + geom_abline(intercept = inter, slope = slop, colour = "red", size = 1)
then everything works as expected. Here's what I suspect is happening (although Hadley could probably clarify things). When ggplot2 "saves" the data, what it actually does is save a data frame, and the names of the parameters. So for the command as I have given it, you get
> summary(pltList[["a1"]])
data: x, y [50x2]
mapping: x = x, y = y
scales: x, y
faceting: facet_grid(. ~ ., FALSE)
-----------------------------------
geom_point:
stat_identity:
position_identity: (width = NULL, height = NULL)
mapping: group = 1
geom_abline: colour = red, size = 1
stat_abline: intercept = 2.55595281266726, slope = 0.05543539319091
position_identity: (width = NULL, height = NULL)
However, if you don't specify a data parameter in qplot, all the variables get evaluated in the current scope, because there is no attached (read: saved) data frame.
data: [0x0]
mapping: x = x, y = y
scales: x, y
faceting: facet_grid(. ~ ., FALSE)
-----------------------------------
geom_point:
stat_identity:
position_identity: (width = NULL, height = NULL)
mapping: group = 1
geom_abline: colour = red, size = 1
stat_abline: intercept = 2.55595281266726, slope = 0.05543539319091
position_identity: (width = NULL, height = NULL)
So when the plot is generated the second time around, rather than using the original values, it uses the current values of x and y.
I think you should use the data argument in qplot, i.e., store your vectors in a data frame.
See Hadley's book, Section 4.4:
The restriction on the data is simple: it must be a data frame. This is restrictive, and unlike other graphics packages in R. Lattice functions can take an optional data frame or use vectors directly from the global environment. ...
The data is stored in the plot object as a copy, not a reference. This has two
important consequences: if your data changes, the plot will not; and ggplot2 objects are entirely self-contained so that they can be save()d to disk and later load()ed and plotted without needing anything else from that session.
There is a bug in your code concerning list subscripting. It should be
pltList[[pltName]]
not
pltList[pltName]
Note:
class(pltList[1])
[1] "list"
pltList[1] is a list containing the first element of pltList.
class(pltList[[1]])
[1] "ggplot"
pltList[[1]] is the first element of pltList.
For your second question: Multi-page pdfs are easy -- see help(pdf):
onefile: logical: if true (the default) allow multiple figures in one
file. If false, generate a file with name containing the
page number for each page. Defaults to ‘TRUE’.
For your main question, I don't understand if you want to store the plot inputs in a list for later processing, or the plot outputs. If it is the latter, I am not sure that plot() returns an object you can store and retrieve.
Another suggestion regarding your second question would be to use either Sweave or Brew as they will give you complete control over how you display your multi-page pdf.
Have a look at this related question.