Why the point size using sns.lmplot is different when I used plt.scatter? - pandas

I want to do a scatterplot according x and y variables, and the points size depend of a numeric variable and the color of every point depend of a categorical variable.
First, I was trying this with plt.scatter:
Graph 1
After, I tried this using lmplot but the point size is different in relation to the first graph.
I think the two graphs should be equals. Why not?
The point size is different in every graph.
Graph 2

Your question is no so much descriptive but i guess you want to control the size of the marker. Here is more documentation
Here is the start point for you.
A numeric variable can also be assigned to size to apply a semantic mapping to the areas of the points:
import seaborn as sns
tips = sns.load_dataset("tips")
sns.scatterplot(data=tips, x="total_bill", y="tip", hue="size", size="size")
For seaborn scatterplot:
df = sns.load_dataset("anscombe")
sp = sns.scatterplot(x="x", y="y", hue="dataset", data=df)
And to change the size of the points you use the s parameter.
sp = sns.scatterplot(x="x", y="y", hue="dataset", data=df, s=100)

Related

Specify Matplotlib's kwargs to Seaborn's displot when hue is used

Suppose we have this:
import seaborn as sns
import pandas as pd
import numpy as np
samples = 2**13
data = pd.DataFrame({'Values': list(np.random.normal(size=samples)) + list(np.random.uniform(size=samples)),
'Kind': ['Normal'] * samples + ['Uniform'] * samples})
sns.displot(data, hue='Kind', x='Values', fill=True)
I want my Normal's histogram (or KDE) emphasized. I'd like it in red and non transparent in the background. Uniform should have alpha = .5.
How do I specify these style parameters in a "per hue" manner?
It's possible to do it with two separate histplots on the same Axes, as #Redox suggested. We can basically recreate the same plot, but with fine-grade control over colours and alpha. However I had to explicitly pass the number of bins in to get the same plot as yours. I also needed to define the colour for Uniform otherwise a ghost element would be added to the legend! I used C1, meaning the first default colour.
_, ax = plt.subplots()
sns.histplot(data=data[data.Kind=='Normal'], x="Values", ax=ax, label='Normal', color='tab:red',bins=130,alpha=1)
sns.histplot(data=data[data.Kind=='Uniform'], x="Values", ax=ax, label='Uniform', color='C1',bins=17, alpha=.5)
ax.set_xlabel('')
ax.legend()
Note that if you just want to set the colour without alpha you can already do this on a displot via the palette argument - pass in a dictionary of your unique hue values to colour names. However, the alpha that you pass in must be a scalar. I tried to use this clever answer to set colours as RGBA colours which include alpha, which seems to work with other figure level plots in Seaborn. However, displot overrides this and sets the alpha separately!

Matplotlib/Seaborn: Boxplot collapses on x axis

I am creating a series of boxplots in order to compare different cancer types with each other (based on 5 categories). For plotting I use seaborn/matplotlib. It works fine for most of the cancer types (see image right) however in some the x axis collapses slightly (see image left) or strongly (see image middle)
https://i.imgur.com/dxLR4B4.png
Looking into the code how seaborn plots a box/violin plot https://github.com/mwaskom/seaborn/blob/36964d7ffba3683de2117d25f224f8ebef015298/seaborn/categorical.py (line 961)
violin_data = remove_na(group_data[hue_mask])
I realized that this happens when there are too many nans
Is there any possibility to prevent this collapsing by code only
I do not want to modify my dataframe (replace the nans by zero)
Below you find my code:
boxp_df=pd.read_csv(pf_in,sep="\t",skip_blank_lines=False)
fig, ax = plt.subplots(figsize=(10, 10))
sns.violinplot(data=boxp_df, ax=ax)
plt.xticks(rotation=-45)
plt.ylabel("label")
plt.tight_layout()
plt.savefig(pf_out)
The output is a per cancer type differently sized plot
(depending on if there is any category completely nan)
I am expecting each plot to be in the same width.
Update
trying to use the order parameter as suggested leads to the following output:
https://i.imgur.com/uSm13Qw.png
Maybe this toy example helps ?
|Cat1|Cat2|Cat3|Cat4|Cat5
|3.93| |0.52| |6.01
|3.34| |0.89| |2.89
|3.39| |1.96| |4.63
|1.59| |3.66| |3.75
|2.73| |0.39| |2.87
|0.08| |1.25| |-0.27
Update
Apparently, the problem is not the data but the length of the title
https://github.com/matplotlib/matplotlib/issues/4413
Therefore I would close the question
#Diziet should I delete it or does my issue might help other ones?
Sorry for not including the line below in the code example:
ax.set_title("VERY LONG TITLE", fontsize=20)
It's hard to be sure without data to test it with, but I think you can pass the names of your categories/cancers to the order= parameter. This forces seaborn to use/display those, even if they are empty.
for instance:
tips = sns.load_dataset("tips")
ax = sns.violinplot(x="day", y="total_bill", data=tips, order=['Thur','Fri','Sat','Freedom Day','Sun','Durin\'s Day'])

Change marker size in Seaborn Factorplot

I'm trying to change the markersize in Seaborn factorplots but I am not sure what keyword argument to pass
import seaborn as sns
exercise = sns.load_dataset("exercise")
g = sns.factorplot(x="time", y="pulse", hue="kind", data=exercise, ci= .95)
I tried passing markersize and s based off of these StackOverFlow answers but neither seem to have an effect
pyplot scatter plot marker size
Factorplot is calling the underlying function pointplot on default which accepts the argument markers. This is used to differentiate the markershapes. The size for all lines and markers can be changed with the scale argument.
exercise = sns.load_dataset("exercise")
g = sns.factorplot(x="time", y="pulse", hue="kind", data=exercise, ci=95,
markers=['o', 'v', 's'],
scale = 1.5)
Same data as above with different shapes
Please also note the ci argument in your example, .95 would result in a different figure with ci's hardly to see.

Multiplot with matplotlib without knowing the number of plots before running

I have a problem with Matplotlib's subplots. I do not know the number of subplots I want to plot beforehand, but I know that I want them in two rows. so I cannot use
plt.subplot(212)
because I don't know the number that I should provide.
It should look like this:
Right now, I plot all the plots into a folder and put them together with illustrator, but there has to be a better way with Matplotlib. I can provide my code if I was unclear somewhere.
My understanding is that you only know the number of plots at runtime and hence are struggling with the shorthand syntax, e.g.:
plt.subplot(121)
Thankfully, to save you having to do some awkward math to figure out this number programatically, there is another interface which allows you to use the form:
plt.subplot(n_cols, n_rows, plot_num)
So in your case, given you want n plots, you can do:
n_plots = 5 # (or however many you programatically figure out you need)
n_cols = 2
n_rows = (n_plots + 1) // n_cols
for plot_num in range(n_plots):
ax = plt.subplot(n_cols, n_rows, plot_num)
# ... do some plotting
Alternatively, there is also a slightly more pythonic interface which you may wish to be aware of:
fig, subplots = plt.subplots(n_cols, n_rows)
for ax in subplots:
# ... do some plotting
(Notice that this was subplots() not the plain subplot()). Although I must admit, I have never used this latter interface.
HTH

Problems with zeros in matplotlib.colors.LogNorm

I am plotting a histogram using
plt.imshow(hist2d, norm = LogNorm(), cmap = gray)
where hist2d is a matrix of histogram values. This works fine except for elements in hist2d that are zero. In particular, I obtain the following image
but would like the white patches to be black.
Thank you!
Here's an alternative method that does not require you to muck with your data by setting a rgb value for bad pixels.
import copy
data = np.arange(25).reshape((5,5))
my_cmap = copy.copy(matplotlib.cm.get_cmap('gray')) # copy the default cmap
my_cmap.set_bad((0,0,0))
plt.imshow(data,
norm=matplotlib.colors.LogNorm(),
interpolation='nearest',
cmap=my_cmap)
The problem is that bins with 0 can not be properly log normalized so they are flagged as 'bad', which are mapped to differently. The default behavior is to not draw anything on those pixels. You can also specify what color to draw pixels that are over or under the limits of the color map (the default is to draw them as the highest/lowest color).
If you're happy with the colour scaling as is, and simply want the 0 values to be black, I'd simply change the input matrix so that the 0s are replaced by the next smallest value:
import matplotlib.pyplot as plt
import matplotlib.cm, matplotlib.colors
import numpy
hist2d = numpy.arange(9).reshape(3,3)
plt.imshow(numpy.maximum(hist2d, sorted(hist2d.flat)[1]),
interpolation='nearest',
norm = matplotlib.colors.LogNorm(),
cmap = matplotlib.cm.gray)
produces
Was this generated with the matplotlib hist2d function?
All you need to do is go through the matrix and set some arbitrary floor value, then make sure to plot this with fixed limits
for f in hist2d:
f += 1e-3
then when you show the figure, all of the whitespace will now be at the floor value, and will show up on the lognormal plot . However, if you are letting hist2d automatically pick the scaling for you, it will want to use the 1e-3 floor value as the minimum. To avoid this, you need to set vmin and vmax values in hist2d
hist2d(x,y,bins=40, norm=LogNorm(), vmin=1, vmax=1e4)