map the colors of a boxplot with values using seaborn - pandas

I am making a boxplot with seaborn base on some values , and i want to map the colors with some other colors i am doing this :
plt.figure(figsize=(20,12))
sns.boxplot(y='name',x='value',data=df,showfliers=False,orient="h")
the result is boxplots with random colors i want the colors to be defined according to a value of a third column in the dataframe. The only thing i could find it the use of "HUE" but it is dividing the data on more boxplots and it is not what i want to do

You can indeed specify the color with hue:
sns.boxplot(x='name', y='value', data=df
hue='name', # same with `x`
palette={'A':'r','B':'b'}, # specify color
)

Related

How to overlay hatches on shapefile with condition?

I've been trying to plot hatches (like this pattern, "//") on polygons of a shapefile, based on a condition. The condition is that whichever polygon values ("Sig") are greater than equal to 0.05, there should be a hatch pattern for them. Unfortunately the resulting map doesn't meet my requirements.
So I first plot the "AMOTL" variable and then wanted to plot the hatches (variable Sig) on top of them (if the values are greater than equal to 0.05). I have used the following code:
import contextily as ctx
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.ticker as ticker
from matplotlib.patches import Ellipse, Polygon
data = gpd.read_file("mapsignif.shp")
Sig = data.loc[data["Sig"].ge(0.05)]
data.loc[data["AMOTL"].eq(0), "AMOTL"] = np.nan
ax = data.plot(
figsize=(12, 10),
column="AMOTL",
legend=True,
cmap="bwr",
vmin = -1,
vmax= 1,
missing_kwds={"color":"white"},
)
Sig.plot(
ax=ax,
hatch='//'
)
map = Basemap(
llcrnrlon=-50,
llcrnrlat=30,
urcrnrlon=50.0,
urcrnrlat=85.0,
resolution="i",
lat_0=39.5,
lon_0=1,
)
map.fillcontinents(color="lightgreen")
map.drawcoastlines()
map.drawparallels(np.arange(10,90,20),labels=[1,1,1,1])
map.drawmeridians(np.arange(-180,180,30),labels=[1,1,0,1])
Now the problem is that my original image (on which I want to plot the hatches) is different from the image resulting from the above code:
Original Image -
Resultant image from above code:
I basically want to plot hatches on that first image. This topic is similar to correlation plots where you have places with hatches (if the p-value is greater than 0.05). The first image plots the correlation variable and some of them are significant (defined by Sig). So I want to plot the Sig variable on top of the AMOTL. I've tried variations of the code, but still can't get through.
Would be grateful for some assistance... Here's my file - https://drive.google.com/file/d/10LPNjBtQMdQMw6XmXdJEg6Uq4icx_LD6/view?usp=sharing
I’d bet this is the culprit:
data.loc[data["Sig"].ge(0.05), "Sig"].plot(
column="Sig", hatch='//'
)
In this line, you’re selecting only the 'Sig' column, eliminating all spatial data in the 'geometry' column and returning a pandas.Series instead of a geopandas.GeoDataFrame. In order to plot a data column using the geometries column for your shapes you must maintain at least both of those columns in the object you call .plot on.
So instead, don’t select the column:
data.loc[data["Sig"].ge(0.05)].plot(
column="Sig", hatch='//'
)
You are already telling geopandas to plot the "Sig" column by using the column argument to .plot - no need to limit the actual data too.
Also, when overlaying a plot on an existing axis, be sure to pass in the axis object:
data.loc[data["Sig"].ge(0.05)].plot(
column="Sig", hatch='//', ax=ax
)

Specify Matplotlib's kwargs to Seaborn's displot when hue is used

Suppose we have this:
import seaborn as sns
import pandas as pd
import numpy as np
samples = 2**13
data = pd.DataFrame({'Values': list(np.random.normal(size=samples)) + list(np.random.uniform(size=samples)),
'Kind': ['Normal'] * samples + ['Uniform'] * samples})
sns.displot(data, hue='Kind', x='Values', fill=True)
I want my Normal's histogram (or KDE) emphasized. I'd like it in red and non transparent in the background. Uniform should have alpha = .5.
How do I specify these style parameters in a "per hue" manner?
It's possible to do it with two separate histplots on the same Axes, as #Redox suggested. We can basically recreate the same plot, but with fine-grade control over colours and alpha. However I had to explicitly pass the number of bins in to get the same plot as yours. I also needed to define the colour for Uniform otherwise a ghost element would be added to the legend! I used C1, meaning the first default colour.
_, ax = plt.subplots()
sns.histplot(data=data[data.Kind=='Normal'], x="Values", ax=ax, label='Normal', color='tab:red',bins=130,alpha=1)
sns.histplot(data=data[data.Kind=='Uniform'], x="Values", ax=ax, label='Uniform', color='C1',bins=17, alpha=.5)
ax.set_xlabel('')
ax.legend()
Note that if you just want to set the colour without alpha you can already do this on a displot via the palette argument - pass in a dictionary of your unique hue values to colour names. However, the alpha that you pass in must be a scalar. I tried to use this clever answer to set colours as RGBA colours which include alpha, which seems to work with other figure level plots in Seaborn. However, displot overrides this and sets the alpha separately!

Make colors in seaborn based on column names

I have a table with three columns: Names, A, B that I use to create a plot with the following code:
import seaborn as sns
sns.scatterplot(data=df, x="B", y="A")
How can make two different colors for dots based on column names? (i.e. A - red, B - green)
In a scatterplot, each point represents a pair values relating two sets of data, in your case A and B. Therefore, since each point on the graph is a pair, you can't colour different each individual point based on 'A' or 'B'.
What you can do, is set a different colour based on your Name column, using hue argument.
Below is an example using seaborn's tips dataset.
import seaborn as sns
sns.scatterplot(data=tips, x="total_bill", y="tip", hue="time")
In your case try something like:
sns.scatterplot(data=df, x="B", y="A",hue="Name")
https://seaborn.pydata.org/generated/seaborn.scatterplot.html

Seaborn chart colors are different than those specified by palette

Why are seaborn chart colors different from the colors specified by the palette?
The following two charts show the difference between the colors as they appear on a bar chart, and the colors as they appear in the palette plot. You can see if yo ulook carefully, that the colors on the bar chart are slightly less bright/saturated.
Why are these different, and how can I get the bar chart to have the exact same colors as the ones specified in the palette?
import seaborn as sns
sns.set(style="white")
titanic = sns.load_dataset("titanic")
colors = ["windows blue", "amber", "greyish", "faded green", "dusty
purple"]
ax = sns.countplot(x="class", data=titanic,
palette=sns.xkcd_palette(colors))
sns.palplot(sns.xkcd_palette(colors))
Bar chart
Palette plot
Many seaborn plotting commands have an argument saturation, whose default value is 0.75. It sets the saturation (S) of the colors in the HSL colorspace (ranging from 0 to 1) to the given value.
Setting this parameter to 1 in the countplot will give you the same colors in both plots.
ax = sns.countplot(x="class", data=titanic, palette=sns.xkcd_palette(colors), saturation=1)
sns.palplot(sns.xkcd_palette(colors))
The reason for this default desaturation is that many people consider a plot with less contrast to be more appealing. That is also why the default background in seaborn is not white but some bluish gray. After all, this is of course a question of taste.

Colors for pandas timeline graphs with many series

I am using pandas for graphing data for a cluster of nodes. I find that pandas is repeating color values for the different series, which makes them indistinguishable.
I tried giving custom color values like this and passed the my_colors to the colors field in plot:
my_colors = []
for node in nodes_list:
my_colors.append(rand_color())
rand_color() is defined as follows:
def rand_color():
from random import randrange
return "#%s" % "".join([hex(randrange(16, 255))[2:] for i in range(3)])
But here also I need to avoid color values that are too close to distinguish. I sometimes have as many as 60 nodes (series). Most probably a hard-coded list of color values would be best option?
You can get a list of colors from any colormap defined in Matplotlib, and even custom colormaps, by:
>>> import matplotlib.pyplot as plt
>>> colors = plt.cm.Paired(np.linspace(0,1,60))
Plotting an example with these colors:
>>> plt.scatter( range(60), [0]*60, color=colors )
<matplotlib.collections.PathCollection object at 0x04ED2830>
>>> plt.axis("off")
(-10.0, 70.0, -0.0015, 0.0015)
>>> plt.show()
I found the "Paired" colormap to be especially useful for this kind of things, but you can use any other available or custom colormap.