barplot plot extra bars - line

I am fairly new to R, so I hope you can help me out with a simple solution. In a barplot in R, I want to add horizontal lines on top of the bars that represent the different categories on the x-axis (which represent expected values). The expected values vary per category. Here's a little piece of my script.
nem=c(0,1,2,3,4,5,6)
fish=c(103,72,44,13,3,1,1)
table=data.frame(nem,fish)
ticks=seq(1,6,1)
graph=barplot(fish,las=2,ylim=c(0,120),main="Number of nematodes per fish")
axis(1,at=graph,labels=c(0,1,2,3,4,5,6))
Hope you can help me out!
Image of barplot

I hope that can help you - just put the lines under your code.
linelength <- c(10,20,12,50,12,14,21)
xl <- seq(from=0.7, by=1.2, along.with = fish)
yl <- cbind(fish, fish + linelength)
for(z in 1:length(fish)){
lines(x=rep(xl[z], 2), y=yl[z,])
}

Related

3d plot with multiple lines showing the projection on the xy plane

I was wondering how to have a 3d plot with multiple lines showing the projection on the xy plane by means of something like fill_between but in 3D. I have here a sample code.
fig, ax = plt.subplots(figsize=(12,8),subplot_kw={'projection': '3d'})
for i in np.arange(0.0,1,0.1):
x1=np.arange(0,1-i+0.01,0.01)
y1=1.0-i-x1
def z_func(x,y,z):
return x+y**2+0.5*z #can be any fn
coordinates3= [[i,j,1-i-j] for j in np.arange(0,1-i+0.01,0.01)]
z1=np.array([z_func(*k) for k in coordinates3])
ax.plot(x1,y1,z1)
ax.view_init(azim=10,elev=20)
plt.show()
I'd like to have each line 'projected' on the xy plane, with a shaded filling in between the curve and its projection. Anybody knows a quick way?
After the suggestion in the comments of #ImportanceOfBeingErnest, I was able to write a solution. I came up with this:
fig, ax = plt.subplots(figsize=(12,8),subplot_kw={'projection': '3d'})
prop_cycle = plt.rcParams['axes.prop_cycle']
colors = prop_cycle.by_key()['color']
for color,i in enumerate(np.arange(0.0,1,0.1)):
x1=np.arange(0,1-i+0.01,0.01)
y1=1.0-i-x1
def z_func(x,y,z):
return x+y**2+0.5*z #can be any fn
coordinates3= [[i,j,1-i-j] for j in np.arange(0,1-i+0.01,0.01)]
z1=np.array([z_func(*k) for k in coordinates3])
verts=[[(k[1],k[2],z_func(*k)) for k in coordinates3]]
verts[0].insert(0,(coordinates3[0][1],coordinates3[0][2],0))
verts[0].insert(0,(coordinates3[-1][1],coordinates3[-1][2],0))
poly = Poly3DCollection(verts,color=colors[color])
poly.set_alpha(0.2)
ax.add_collection(poly)
ax.plot(x1,y1,z1,linewidth=10)
ax.view_init(azim=10,elev=20)
plt.show()
One thing that puzzles me is that the shade doesn't get the color of the line and that I had to supply it myself. If you remove the color=colors[color] in the Poly3DCollection you always get blue shades, whereas the lines automatically get the different colors, as one can see in the question. Anybody knows a reason for this?

Matplotlib/Seaborn: Boxplot collapses on x axis

I am creating a series of boxplots in order to compare different cancer types with each other (based on 5 categories). For plotting I use seaborn/matplotlib. It works fine for most of the cancer types (see image right) however in some the x axis collapses slightly (see image left) or strongly (see image middle)
https://i.imgur.com/dxLR4B4.png
Looking into the code how seaborn plots a box/violin plot https://github.com/mwaskom/seaborn/blob/36964d7ffba3683de2117d25f224f8ebef015298/seaborn/categorical.py (line 961)
violin_data = remove_na(group_data[hue_mask])
I realized that this happens when there are too many nans
Is there any possibility to prevent this collapsing by code only
I do not want to modify my dataframe (replace the nans by zero)
Below you find my code:
boxp_df=pd.read_csv(pf_in,sep="\t",skip_blank_lines=False)
fig, ax = plt.subplots(figsize=(10, 10))
sns.violinplot(data=boxp_df, ax=ax)
plt.xticks(rotation=-45)
plt.ylabel("label")
plt.tight_layout()
plt.savefig(pf_out)
The output is a per cancer type differently sized plot
(depending on if there is any category completely nan)
I am expecting each plot to be in the same width.
Update
trying to use the order parameter as suggested leads to the following output:
https://i.imgur.com/uSm13Qw.png
Maybe this toy example helps ?
|Cat1|Cat2|Cat3|Cat4|Cat5
|3.93| |0.52| |6.01
|3.34| |0.89| |2.89
|3.39| |1.96| |4.63
|1.59| |3.66| |3.75
|2.73| |0.39| |2.87
|0.08| |1.25| |-0.27
Update
Apparently, the problem is not the data but the length of the title
https://github.com/matplotlib/matplotlib/issues/4413
Therefore I would close the question
#Diziet should I delete it or does my issue might help other ones?
Sorry for not including the line below in the code example:
ax.set_title("VERY LONG TITLE", fontsize=20)
It's hard to be sure without data to test it with, but I think you can pass the names of your categories/cancers to the order= parameter. This forces seaborn to use/display those, even if they are empty.
for instance:
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips, order=['Thur','Fri','Sat','Freedom Day','Sun','Durin\'s Day'])

matplot pandas plotting multiple y values on the same column

Trying to plot using matplot but lines based on the value of a non x , y column.
For example this is my DF:
code reqs value
AGB 253319 57010.16528
ABC 242292 35660.58176
DCC 240440 36587.45336
CHB 172441 57825.83052
DEF 148357 34129.71166
Which yields this plot df.plot(x='reqs',y='value',figsize=(8,4)) :
What I'm looking to do is have a plot with multiple lines one line for each of the codes. Right now its just doing 1 line and ignoring the code column.
I tried searching for an answer but each one is asking for multiple y's I dont have multiple y's I have the same y but with different focuses
(surely i'm using the wrong terms to describe what I'm trying to do hopefully this example and image makes sense)
The result should look something like this:
So I worked out how to do exactly ^ if anyone is curious:
plt_df = df
fig, ax = plt.subplots()
for key,grp in plt_df.groupby(['code']):
ax = grp.plot(ax=ax, kind ='line',x='reqs',y='value',label=key,figsize=(20,4),title = "someTitle")
plt.show()

Add Marker Annotations for Seaborn regplot

This is my first question on the site, however I've spent a lot of time finding valuable answers here!
I've searched all over the site, and can't find a good solution to my problem, hopefully someone can help! I have a pandas database that I've created a regplot, however I'd really like to add annotations for each marker based on another column.
Here is the code for my existing plot:
fig, ax = plt.subplots()
fig.set_size_inches(8,5)
sns.regplot(x=brakev["Curb_Weight"], y=brakev["Braking_60_0"])
sns.regplot(x=brakev["Curb_Weight"], y=brakev["Braking_60_0"], fit_reg=False)
Here is the diagram: regplot
. I found a proposal on the Python Graph Gallery (and others on Stack Overflow), but I'm struggling to get it to work:
for line in range(0,df.shape[0]):
p1.text(df.x[line]+0.2, df.y[line], df.group[line], horizontalalignment='left', size='medium', color='black', weight='semibold')
I'd like to add an annotation from the column 'Model' next to each marker. I'm less concerned about the position, color, font size at the moment, but that would also be helpful.
Here is the brakev.head() for my database:
brakev.head()
Model Curb_Weight Braking_60_0
0 Transit Connect 3580.0 132.0
1 NV200 3690.0 129.0
2 Sprinter 3710.0 132.0
3 Express 3620.0 135.0
4 Transit 3960.0 136.0
Sorry if this is a duplication, (I'm sure it is, but I can't find it!!). Thanks for the help!
I was able to find a solution! The key was to first store each of the x, y, and label elements into a list:
Curb_Weight = brakev.Curb_Weight.tolist()
Braking_60_0 = brakev.Braking_60_0.tolist()
Model = brakev.Model.tolist()
After this, the proposed solution was relatively easy to implement:
fig, ax = plt.subplots()
fig.set_size_inches(20,12)
sns.regplot(data=brakev, x='Curb_Weight', y='Braking_60_0')
p1 = sns.regplot(data=brakev, x='Curb_Weight', y='Braking_60_0',fit_reg=False,marker='o',scatter_kws={'s':50})
for line in range(0,brakev.shape[0]):
p1.text(Curb_Weight[line]+1.8, Braking_60_0[line], Model[line],
horizontalalignment='left',size='medium',color='black',weight='semibold')
The result includes the annotation label next to the markers, and is relatively legible excluding a few overlaps:
Diagram
If anyone has suggestions for optimization, or ideas how to "flip" the orientation of the label when another marker is within a specific range to avoid the overlap, let me know!

matplotlib/pyplot: print only ticks once in scatter plot?

I am looking for a way to clean-up the ticks in my pyplot scatter plot.
To create a scatter plot from a Pandas dataset column with strings as elements, I followed the example in [2] - and got me a nice scatter plot:
input are 10k data points where the X axis has only ~200 unique 'names', that got matched to scalars for plotting. Obviously, plotting all the 10k ticks on the x axis is a bit clocked. So, I am looking for a way, to print each unique tick only once and not for each data point?
My code looks like:
fig2 = plt.figure()
WNsUniques, WNs = numpy.unique(taskDataFrame['modificationhost'], return_inverse=True)
scatterWNs = fig2.add_subplot(111)
scatterWNs.scatter(WNs, taskDataFrame['cpuconsumptiontime'])
scatterWNs.set(xticks=range(len(WNsUniques)), xticklabels=WNsUniques)
plt.xticks(rotation='vertical')
plt.savefig("%s_WNs-CPUTime_scatter.%s" % (dfName,"pdf"))
actually, I was hoping that setting the plot x ticks to the unique names should be sufficient - but apparently not? Probably it is something easy, but how do I reduce the ticks for my subplot to unique once (should they not already be uniqueified as returned by numpy.unique?)?
Maybe someone has an idea for me?
Cheers ans thanks,
Thomas
You can use the set_xticks method to accomplish this. Note that 200 axis ticks with labels are still quite a lot to force on a small plot like this, and this is what you might already be seeing with the above code. Without complete code to play with, I can't say for sure.
Additionally, what is the size of WNsUniques? That can easily be used to check if your call to unique is doing what you think.