Add Marker Annotations for Seaborn regplot - pandas

This is my first question on the site, however I've spent a lot of time finding valuable answers here!
I've searched all over the site, and can't find a good solution to my problem, hopefully someone can help! I have a pandas database that I've created a regplot, however I'd really like to add annotations for each marker based on another column.
Here is the code for my existing plot:
fig, ax = plt.subplots()
fig.set_size_inches(8,5)
sns.regplot(x=brakev["Curb_Weight"], y=brakev["Braking_60_0"])
sns.regplot(x=brakev["Curb_Weight"], y=brakev["Braking_60_0"], fit_reg=False)
Here is the diagram: regplot
. I found a proposal on the Python Graph Gallery (and others on Stack Overflow), but I'm struggling to get it to work:
for line in range(0,df.shape[0]):
p1.text(df.x[line]+0.2, df.y[line], df.group[line], horizontalalignment='left', size='medium', color='black', weight='semibold')
I'd like to add an annotation from the column 'Model' next to each marker. I'm less concerned about the position, color, font size at the moment, but that would also be helpful.
Here is the brakev.head() for my database:
brakev.head()
Model Curb_Weight Braking_60_0
0 Transit Connect 3580.0 132.0
1 NV200 3690.0 129.0
2 Sprinter 3710.0 132.0
3 Express 3620.0 135.0
4 Transit 3960.0 136.0
Sorry if this is a duplication, (I'm sure it is, but I can't find it!!). Thanks for the help!

I was able to find a solution! The key was to first store each of the x, y, and label elements into a list:
Curb_Weight = brakev.Curb_Weight.tolist()
Braking_60_0 = brakev.Braking_60_0.tolist()
Model = brakev.Model.tolist()
After this, the proposed solution was relatively easy to implement:
fig, ax = plt.subplots()
fig.set_size_inches(20,12)
sns.regplot(data=brakev, x='Curb_Weight', y='Braking_60_0')
p1 = sns.regplot(data=brakev, x='Curb_Weight', y='Braking_60_0',fit_reg=False,marker='o',scatter_kws={'s':50})
for line in range(0,brakev.shape[0]):
p1.text(Curb_Weight[line]+1.8, Braking_60_0[line], Model[line],
horizontalalignment='left',size='medium',color='black',weight='semibold')
The result includes the annotation label next to the markers, and is relatively legible excluding a few overlaps:
Diagram
If anyone has suggestions for optimization, or ideas how to "flip" the orientation of the label when another marker is within a specific range to avoid the overlap, let me know!

Related

Turn off x-axis marginal distribution axes on jointplot using seaborn package

There is a similar question here, however I fail to adapt the provided solutions to my case.
I want to have a jointplot with kind=hex while removing the marginal plot of the x-axis as it contains no information. In the linked question the suggestion is to use JointGrid directly, however Seaborn then seems to to be unable to draw the hexbin plot.
joint_kws = dict(gridsize=70)
g = sns.jointplot(data=all_data, x="Minute of Hour", y="Frequency", kind="hex", joint_kws=joint_kws)
plt.ylim([49.9, 50.1])
plt.xlim([0, 60])
g.ax_joint.axvline(x=30,ymin=49, ymax=51)
plt.show()
plt.close()
How to remove the margin plot over the x-axis?
Why is the vertical line not drawn?
Also is there a way to exchange the right margin to a plot which more clearly resembles the density?
edit: Here is a sample of the dataset (33kB). Read it with pd.read_pickle("./data.pickle")
I've been fiddling with an analog problem (using a scatterplot instead of the hexbin). In the end, the solution to your first point is awkwardly simple. Just add this line :
g.ax_marg_x.remove()
Regarding your second point, I've no clue as to why no line is plotted. But a workaround seems to be to use vlines instead :
g.ax_joint.vlines(x=30, ymin=49, ymax=51)
Concerning your last point, I'm afraid I haven't understood it. If you mean increasing/reducing the margin between the subplots, you can use the space argument stated in the doc.

Matplotlib/Seaborn: Boxplot collapses on x axis

I am creating a series of boxplots in order to compare different cancer types with each other (based on 5 categories). For plotting I use seaborn/matplotlib. It works fine for most of the cancer types (see image right) however in some the x axis collapses slightly (see image left) or strongly (see image middle)
https://i.imgur.com/dxLR4B4.png
Looking into the code how seaborn plots a box/violin plot https://github.com/mwaskom/seaborn/blob/36964d7ffba3683de2117d25f224f8ebef015298/seaborn/categorical.py (line 961)
violin_data = remove_na(group_data[hue_mask])
I realized that this happens when there are too many nans
Is there any possibility to prevent this collapsing by code only
I do not want to modify my dataframe (replace the nans by zero)
Below you find my code:
boxp_df=pd.read_csv(pf_in,sep="\t",skip_blank_lines=False)
fig, ax = plt.subplots(figsize=(10, 10))
sns.violinplot(data=boxp_df, ax=ax)
plt.xticks(rotation=-45)
plt.ylabel("label")
plt.tight_layout()
plt.savefig(pf_out)
The output is a per cancer type differently sized plot
(depending on if there is any category completely nan)
I am expecting each plot to be in the same width.
Update
trying to use the order parameter as suggested leads to the following output:
https://i.imgur.com/uSm13Qw.png
Maybe this toy example helps ?
|Cat1|Cat2|Cat3|Cat4|Cat5
|3.93| |0.52| |6.01
|3.34| |0.89| |2.89
|3.39| |1.96| |4.63
|1.59| |3.66| |3.75
|2.73| |0.39| |2.87
|0.08| |1.25| |-0.27
Update
Apparently, the problem is not the data but the length of the title
https://github.com/matplotlib/matplotlib/issues/4413
Therefore I would close the question
#Diziet should I delete it or does my issue might help other ones?
Sorry for not including the line below in the code example:
ax.set_title("VERY LONG TITLE", fontsize=20)
It's hard to be sure without data to test it with, but I think you can pass the names of your categories/cancers to the order= parameter. This forces seaborn to use/display those, even if they are empty.
for instance:
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips, order=['Thur','Fri','Sat','Freedom Day','Sun','Durin\'s Day'])

matplotlib/pyplot: print only ticks once in scatter plot?

I am looking for a way to clean-up the ticks in my pyplot scatter plot.
To create a scatter plot from a Pandas dataset column with strings as elements, I followed the example in [2] - and got me a nice scatter plot:
input are 10k data points where the X axis has only ~200 unique 'names', that got matched to scalars for plotting. Obviously, plotting all the 10k ticks on the x axis is a bit clocked. So, I am looking for a way, to print each unique tick only once and not for each data point?
My code looks like:
fig2 = plt.figure()
WNsUniques, WNs = numpy.unique(taskDataFrame['modificationhost'], return_inverse=True)
scatterWNs = fig2.add_subplot(111)
scatterWNs.scatter(WNs, taskDataFrame['cpuconsumptiontime'])
scatterWNs.set(xticks=range(len(WNsUniques)), xticklabels=WNsUniques)
plt.xticks(rotation='vertical')
plt.savefig("%s_WNs-CPUTime_scatter.%s" % (dfName,"pdf"))
actually, I was hoping that setting the plot x ticks to the unique names should be sufficient - but apparently not? Probably it is something easy, but how do I reduce the ticks for my subplot to unique once (should they not already be uniqueified as returned by numpy.unique?)?
Maybe someone has an idea for me?
Cheers ans thanks,
Thomas
You can use the set_xticks method to accomplish this. Note that 200 axis ticks with labels are still quite a lot to force on a small plot like this, and this is what you might already be seeing with the above code. Without complete code to play with, I can't say for sure.
Additionally, what is the size of WNsUniques? That can easily be used to check if your call to unique is doing what you think.

how to shift x axis labesl on line plot?

I'm using pandas to work with a data set and am tring to use a simple line plot with error bars to show the end results. It's all working great except that the plot looks funny.
By default, it will put my 2 data groups at the far left and right of the plot, which obscures the error bar to the point that it's not useful (the error bars in this case are key to intpretation so I want them plainly visible).
Now, I fix that problem by setting xlim to open up some space on either end of the x axis so that the error bars are plainly visible, but then I have an offset from where the x labels are to where the actual x data is.
Here is a simplified example that shows the problem:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df6 = pd.DataFrame( [-0.07,0.08] , index = ['A','B'])
df6.plot(kind='line', linewidth=2, yerr = [ [0.1,0.1],[0.1,0.1 ] ], elinewidth=2,ecolor='green')
plt.xlim(-0.2,1.2) # Make some room at ends to see error bars
plt.show()
I tried to include a plot (image) showing the problem but I cannot post images yet, having just joined up and do not have anough points yet to post images.
What I want to know is: How do I shift these labels over one tick to the right?
Thanks in advance.
Well, it turns out I found a solution, which I will jsut post here in case anyone else has this same issue in the future.
Basically, it all seems to work better in the case of a line plot if you just specify both the labels and the ticks in the same place at the same time. At least that was helpful for me. It sort of forces you to keep the length of those two lists the same, which seems to make the assignment between ticks and labels more well behaved (simple 1:1 in this case).
So I coudl fix my problem by including something like this:
plt.xticks([0, 1], ['A','B'] )
right after the xlim statement in code from original question. Now the A and B align perfectly with the place where the data is plotted, not offset from it.
Using above solution it works, but is less good-looking since now the x grid is very coarse (this is purely and aesthetic consideration). I could fix that by using a different xtick statement like:
plt.xticks([-0.2, 0, 0.2, 0.4, 0.6, 0.8, 1.0], ['','A','','','','','B',''])
This gives me nice looking grid and the data where I need it, but of course is very contrived-looking here. In the actual program I'd find a way to make that less clunky.
Hope that is of some help to fellow seekers....

matplotlib pyplot side-by-side graphics

I'm trying to put two scatterplots side-by-side in the same figure. I'm also using prettyplotlib to make the graphs look a little nicer. Here is the code
fig, ax = ppl.subplots(ncols=2,nrows=1,figsize=(14,6))
for each in ['skimmer','dos','webapp','losstheft','espionage','crimeware','misuse','pos']:
ypos = df[df['pattern']==each]['ypos_m']
xpos = df[df['pattern']==each]['xpos_m']
ax[0] = ppl.scatter(ypos,xpos,label=each)
plt.title("Multi-dimensional Scaling: Manhattan")
for each in ['skimmer','dos','webapp','losstheft','espionage','crimeware','misuse','pos']:
ypos = df[df['pattern']==each]['ypos_e']
xpos = df[df['pattern']==each]['xpos_e']
ax[1] = ppl.scatter(ypos,xpos,label=each)
plt.title("Multi-dimensional Scaling: Euclidean")
plt.show()
I don't get any error when the code runs, but what I end up with is one row with two graphs. One graph is completely empty and not styled by prettyplotlib at all. The right side graphic seems to have both of my scatterplots in it.
I know that ppl.subplots is returning a matplotlib.figure.Figure and a numpy array consisting of two matplotlib.axes.AxesSubplot. But I also admit that I don't quite get how axes and subplotting works. Hopefully it's just a simple mistake somewhere.
I think ax[0] = ppl.scatter(ypos,xpos,label=each) should be ax[0].scatter(ypos,xpos,label=each) and ax[1] = ppl.scatter(ypos,xpos,label=each) should be ax[1].scatter(ypos,xpos,label=each), change those and see if your problem get solved.
I am quite sure that the issue is: you are calling ppl.scatter(...), which will try to draw on the current axis, which is the 1st axes of 2 axes you generated (and it is the left one)
Also you may find that in the end, the ax list contains two matplotlib.collections.PathCollections, bot two axis as you may expect.
Since the solution above removes the prettiness of prettyplot, we shall use an alternative solution, which is to change the current working axis, by adding:
plt.sca(ax[0_or_1])
Before ppl.scatter(), inside each loop.