make specific data points in scatter plot seaborn more visible [duplicate] - pandas

I have a Seaborn scatterplot and am trying to control the plotting order with 'hue_order', but it is not working as I would have expected (I can't get the blue dot to show on top of the gray).
x = [1, 2, 3, 1, 2, 3]
cat = ['N','Y','N','N','N']
test = pd.DataFrame(list(zip(x,cat)),
columns =['x','cat']
)
display(test)
colors = {'N': 'gray', 'Y': 'blue'}
sns.scatterplot(data=test, x='x', y='x',
hue='cat', hue_order=['Y', 'N', ],
palette=colors,
)
Flipping the 'hue_order' to hue_order=['N', 'Y', ] doesn't change the plot. How can I get the 'Y' category to plot on top of the 'N' category? My actual data has duplicate x,y ordinates that are differentiated by the category column.

The reason this is happening is that, unlike most plotting functions, scatterplot doesn't (internally) iterate over the hue levels when it's constructing the plot. It draws a single scatterplot and then sets the color of the elements with a vector. It does this so that you don't end up with all of the points from the final hue level on top of all the points from the penultimate hue level on top of all the ... etc. But it means that the scatterplot z-ordering is insensitive to the hue ordering and reflects only the order in the input data.
So you could use your desired hue order to sort the input data:
hue_order = ["N", "Y"]
colors = {'N': 'gray', 'Y': 'blue'}
sns.scatterplot(
data=test.sort_values('cat', key=np.vectorize(hue_order.index)),
x='x', y='x',
hue='cat', hue_order=hue_order,
palette=colors, s=100, # Embiggen the points to see what's happening
)
There may be a more efficient way to do that "sort by list of unique values" built into pandas; I am not sure.

TLDR: Before plotting, sort the data so that the dominant color appears last in the data. Here, it could just be:
test = test.sort_values('cat') # ascending = True
Then you get:
It seems that hue_order doesn't affect the order (or z-order) in which things are plotted. Rather, it affects how colors are assigned. E.g., if you don't specify a specific mapping of categories to colors (i.e. you just use a list of colors or a color palette), this parameter can determine whether 'N' or 'Y' gets the first (and which gets the second) color of the palette. There's an example showing this behavior here in the hue_order section. When you have the dict already linking categories to colors (colors = {'N': 'gray', 'Y': 'blue'}), it seems to just affect the order of labels in the legend, as you probably have seen.
So the key is to make sure the color you want on top is plotted last (and thus "on top"). I would have also assumed the hue_order parameter would do as you expected, but apparently not!

Related

How to use the parameter "annot_kws" of the function "sns.heatmap" to revise the annotaion text?

How can I draw such a heatmap using the "seaborn.heatmap" function?
The color shades are determined by matrix A and the annotation of each grid is determined by matrix B.
For example, if I get a matrix, I want its color to be displayed according to the z-score of this matrix, but the annotation remains the matrix itself.
I know I should resort to the parameter 'annot_kws', but how exactly should I write the code?
Instead of simply setting annot=True, annot= can be set to a dataframe (or 2D numpy array, or a list of lists) with the same number of rows and columns as the data. That way, the coloring will be applied using the data, and the annotation will come from annot. Seaborn will still take care to use white text for the dark cells and black text for the light ones.
annot_kws= is used to change the text properties, typically the fontsize. But you also could change the font itself, or the alignment if you'd used multiline text.
Here is an example using numbers 1 to 36 as annotation, but the numbers modulo 10 for the coloring. The annot_kws are used to enlarge and rotate the text. (Note that when the annotation are strings, you also need to set the format, e.g. fmt='').
import seaborn as sns
import numpy as np
a = pd.DataFrame({'count': [1, 2, 3]})
matrix_B = np.arange(1, 37).reshape(6, 6) # used for annotations
matrix_A = (matrix_B % 10) # used for coloring
sns.heatmap(data=matrix_A, annot=matrix_B,
annot_kws={'size': 20, 'rotation': 45},
square=True, cbar_kws={'label': 'last digit'})

map the colors of a boxplot with values using seaborn

I am making a boxplot with seaborn base on some values , and i want to map the colors with some other colors i am doing this :
plt.figure(figsize=(20,12))
sns.boxplot(y='name',x='value',data=df,showfliers=False,orient="h")
the result is boxplots with random colors i want the colors to be defined according to a value of a third column in the dataframe. The only thing i could find it the use of "HUE" but it is dividing the data on more boxplots and it is not what i want to do
You can indeed specify the color with hue:
sns.boxplot(x='name', y='value', data=df
hue='name', # same with `x`
palette={'A':'r','B':'b'}, # specify color
)

Matplotlib/Seaborn: Boxplot collapses on x axis

I am creating a series of boxplots in order to compare different cancer types with each other (based on 5 categories). For plotting I use seaborn/matplotlib. It works fine for most of the cancer types (see image right) however in some the x axis collapses slightly (see image left) or strongly (see image middle)
https://i.imgur.com/dxLR4B4.png
Looking into the code how seaborn plots a box/violin plot https://github.com/mwaskom/seaborn/blob/36964d7ffba3683de2117d25f224f8ebef015298/seaborn/categorical.py (line 961)
violin_data = remove_na(group_data[hue_mask])
I realized that this happens when there are too many nans
Is there any possibility to prevent this collapsing by code only
I do not want to modify my dataframe (replace the nans by zero)
Below you find my code:
boxp_df=pd.read_csv(pf_in,sep="\t",skip_blank_lines=False)
fig, ax = plt.subplots(figsize=(10, 10))
sns.violinplot(data=boxp_df, ax=ax)
plt.xticks(rotation=-45)
plt.ylabel("label")
plt.tight_layout()
plt.savefig(pf_out)
The output is a per cancer type differently sized plot
(depending on if there is any category completely nan)
I am expecting each plot to be in the same width.
Update
trying to use the order parameter as suggested leads to the following output:
https://i.imgur.com/uSm13Qw.png
Maybe this toy example helps ?
|Cat1|Cat2|Cat3|Cat4|Cat5
|3.93| |0.52| |6.01
|3.34| |0.89| |2.89
|3.39| |1.96| |4.63
|1.59| |3.66| |3.75
|2.73| |0.39| |2.87
|0.08| |1.25| |-0.27
Update
Apparently, the problem is not the data but the length of the title
https://github.com/matplotlib/matplotlib/issues/4413
Therefore I would close the question
#Diziet should I delete it or does my issue might help other ones?
Sorry for not including the line below in the code example:
ax.set_title("VERY LONG TITLE", fontsize=20)
It's hard to be sure without data to test it with, but I think you can pass the names of your categories/cancers to the order= parameter. This forces seaborn to use/display those, even if they are empty.
for instance:
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips, order=['Thur','Fri','Sat','Freedom Day','Sun','Durin\'s Day'])

How to 'bottom' align within matplotlib multi column legend?

Given a multi column legend created like this:
plot.fig.legend(handles, labels, ncol=2 ....)
Is there a way to define the inner alignment?
I'm getting a legend that has a 'top' alignment:
But I wish for a 'bottom' alignment like this:
Their is a similar question that never received an answer here:
aligning matplotlib subplots legends
The question that #Sheldore linked to, was different but its solution similar.
I could adopt it for my issue.
handles.insert(2, plt.plot([], [], color=(0, 0, 0, 0), label=" ")[0])
labels.insert(2, '')
plot.fig.legend(handles, labels, ncol=2, ....)
The idea is that the legend is table like. If you want an element to move to a different position you have to fill in an empty entry into the other positions.

geometry of colorbars in matplotlib

Plotting a figure with a colorbar, like for example the ellipse collection of the matplotlib gallery, I'm trying to understand the geometry of the figure. If I add the following code in the source code (instead of plt.show()):
cc=plt.gcf().get_children()
print(cc[1].get_geometry())
print(cc[2].get_geometry())
I get
(1, 2, 1)
(3, 1, 2)
I understand the first one - 1 row, two columns, plot first (and presumably the second is the colorbar), but I don't understand the second one, which I would expect to be (1,2,2). What do these values correspond to?
Edit: It seems that the elements in cc do not have the same axes,which would explain the discrepancies. Somehow, I'm still confused with the geometries that are reported.
What's happening is when you call colorbar, use_gridspec defaults to True which then makes a call to matplotlib.colorbar.make_axes_gridspec which then creates a 1 by 2 grid to hold the plot and cbar axes then then cbar axis itself is actually a 3 by 1 grid that has its aspect ratio adjusted
the key line in matplotlib.colorbar.make_axes_gridspec which makes this happen is
gs2 = gs_from_sp_spec(3, 1, subplot_spec=gs[1], hspace=0.,
height_ratios=wh_ratios)
because wh_ratios == [0.0, 1.0, 0.0] by default so the other two subplots above and below are 0 times the size of the middle plot.
I've put what I did to figure this out into an IPython notebook