Seaborn lineplot, different markers for different boolean values - dataframe

I have a dataframe that consists of 3 columns. Champion (categorical, holds string values), total damage done (numerical), win (holds Boolean values, True or False). I want to draw a line and I want its markers to be "o" if "win == True" and "x" if "win == False". I tried the code that I have attached here but it doesn't work.It gives ValueError: Filled and line art markers cannot be mixed.I tried to do it with hue or style but it changes the line style rather than marker. And I tried giving style my win column and I tried to make markers to follow from that, but that didn't work either. Can anyone help?
Thanks
Only with style ScreenShot
fig = plt.figure(figsize=(12,8))
h = sns.lineplot(data=skyhill_all,x='champion',y='totalDamageDealt',style='win',markers=['o','x'])
h.yaxis.set_minor_locator(AutoMinorLocator())
h.tick_params(which='both',width=2)
h.tick_params(which='major',length=8)
h.tick_params(which='minor',length=4)
h.set_ylabel('Total Damage Done')
h.set_xlabel('Played Champions')
h.set_yticks(np.arange(5000,75000,5000))
print(h)

The simplest approach is to draw a line graph in matplotlib, then set the markers and colors in a scatter plot. The rest is setting the seaborn style. You can choose your own style in the future.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
skyhill_all = pd.DataFrame({'champion':list('ABCDEF'),
'totalDamageDealt':np.random.randint(100,10000,6),
'win': [True,False,False,True,True,False]})
plt.style.use('seaborn-white')
fig, ax = plt.subplots()
m = ['o' if x == 1 else 'x' for x in skyhill_all['win']]
c = ['orange' if x == 1 else 'blue' for x in skyhill_all['win']]
ax.plot(skyhill_all['champion'], skyhill_all['totalDamageDealt'])
for i in range(len(skyhill_all)):
ax.scatter(skyhill_all['champion'][i], skyhill_all['totalDamageDealt'][i], marker=m[i], color=c[i])
plt.show()

Related

Equivalent of Hist()'s Layout hyperparameter in Sns.Pairplot?

Am trying to find hist()'s figsize and layout parameter for sns.pairplot().
I have a pairplot that gives me nice scatterplots between the X's and y. However, it is oriented horizontally and there is no equivalent layout parameter to make them vertical to my knowledge. 4 plots per row would be great.
This is my current sns.pairplot():
sns.pairplot(X_train,
x_vars = X_train.select_dtypes(exclude=['object']).columns,
y_vars = ["SalePrice"])
This is what I would like it to look like: Source
num_mask = train_df.dtypes != object
num_cols = train_df.loc[:, num_mask[num_mask == True].keys()]
num_cols.hist(figsize = (30,15), layout = (4,10))
plt.show()
What you want to achieve isn't currently supported by sns.pairplot, but you can use one of the other figure-level functions (sns.displot, sns.catplot, ...). sns.lmplot creates a grid of scatter plots. For this to work, the dataframe needs to be in "long form".
Here is a simple example. sns.lmplot has parameters to leave out the regression line (fit_reg=False), to set the height of the individual subplots (height=...), to set its aspect ratio (aspect=..., where the subplot width will be height times aspect ratio), and many more. If all y ranges are similar, you can use the default sharey=True.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# create some test data with different y-ranges
np.random.seed(20230209)
X_train = pd.DataFrame({"".join(np.random.choice([*'uvwxyz'], np.random.randint(3, 8))):
np.random.randn(100).cumsum() + np.random.randint(100, 1000) for _ in range(10)})
X_train['SalePrice'] = np.random.randint(10000, 100000, 100)
# convert the dataframe to long form
# 'SalePrice' will get excluded automatically via `melt`
compare_columns = X_train.select_dtypes(exclude=['object']).columns
long_df = X_train.melt(id_vars='SalePrice', value_vars=compare_columns)
# create a grid of scatter plots
g = sns.lmplot(data=long_df, x='SalePrice', y='value', col='variable', col_wrap=4, sharey=False)
g.set(ylabel='')
plt.show()
Here is another example, with histograms of the mpg dataset:
import matplotlib.pyplot as plt
import seaborn as sns
mpg = sns.load_dataset('mpg')
compare_columns = mpg.select_dtypes(exclude=['object']).columns
mpg_long = mpg.melt(value_vars=compare_columns)
g = sns.displot(data=mpg_long, kde=True, x='value', common_bins=False, col='variable', col_wrap=4, color='crimson',
facet_kws={'sharex': False, 'sharey': False})
g.set(xlabel='')
plt.show()

Add text flush left below plot in python

I'd like to add text beneath a plot, which includes the source of the used data.
It should be positioned at the edge of the image, so beneath the longest ytick and if possible at a fixed vertical distance to the x-axis.
My approach:
import matplotlib.pyplot as plt
country = ['Portugal','Spain','Austria','Italy','France','Federal Republic of Germany']
value = [6,8,10,12,14,25]
plt.figure(figsize=(4,4))
plt.barh(country,value)
plt.xlabel('x-axis')
plt.text(-18,-2.5,'Source: blablablablablablablablablablablablablablablablabla',ha='left')
Plot of the code
I used plt.text(). My problem with the command is, that I have to manually try x and y values (in the code: -18,-2.5) for different plots.
Is there a better way?
Thanks in advance.
Firstly, I got the box info of yticklabels, and then got the leftmost x location for all the yticklabels. Finally, the blended transform method was used to add text with some location adjustments.
import matplotlib.pyplot as plt
from matplotlib.transforms import IdentityTransform
import matplotlib.transforms as transforms
country = ['Portugal','Spain','Austria','Italy','France','Federal Republic of Germany']
value = [6,8,10,12,14,25]
plt.figure(figsize=(4,4))
plt.barh(country,value)
plt.xlabel('x-axis')
ax = plt.gca()
fig =plt.gcf()
fig.tight_layout()
fig.canvas.draw()
labs = ax.get_yticklabels()
xlocs = []
for ilab in labs:
xlocs.append(ilab.get_window_extent().x0)
print(xlocs)
x0 = min(xlocs)
trans = transforms.blended_transform_factory(IdentityTransform(), ax.transAxes)
plt.text(x0-2.5,-0.2,'Source: blablablablablablablablablablablablablablablablabla',ha='left',transform=trans)
plt.savefig("flush.png",bbox_inches="tight")

Seaborn swarmplot marker colors

I have my data stored in the following pandas dataframe df.
I am using the following command to plot a box and swarmplot superimposed.
hue_plot_params = {'data': df,'x': 'Genotype','y': 'centroid velocity','hue': 'Surface'}
sns.boxplot(**hue_plot_params)
sns.swarmplot(**hue_plot_params)
However, there is one more category 'Fly' which I would also like to plot as different marker colors in my swarmplot. As the data in each column is made up of multiple 'Fly' types, I would like to have multiple colors for the corresponding markers instead of blue and orange.
(ignore the significance bars, those have been made with another command)
Could someone help me with how to do this?
The following approach iterates through the 'Fly' categories. A swarmplot needs to be created in one go, but a stripplot could be drawn one by one on top of each other.
'Surface' is still used as hue to get them plotted as "dodged", but the palette uses the color for the fly category two times.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
sns.set()
np.random.seed(2022)
df = pd.DataFrame({'Genotype': np.repeat(['SN1_TNT', 'SN2_TNT', 'SN3_TNT', 'SN4_TNT'], 100),
'centroid velocity': 20 - np.random.normal(0.01, 1, 400).cumsum(),
'Surface': np.tile(np.repeat(['smooth', 'rough'], 50), 4),
'Fly': np.random.choice(['Fly1', 'Fly2', 'Fly3'], 400)})
hue_plot_params = {'x': 'Genotype', 'y': 'centroid velocity', 'hue': 'Surface', 'hue_order': ['smooth', 'rough']}
ax = sns.boxplot(data=df, **hue_plot_params)
handles, _ = ax.get_legend_handles_labels() # get the legend handles for the boxplots
fly_list = sorted(df['Fly'].unique())
for fly, color in zip(fly_list, sns.color_palette('pastel', len(fly_list))):
sns.stripplot(data=df[df['Fly'] == fly], **hue_plot_params, dodge=True,
palette=[color, color], label=fly, ax=ax)
strip_handles, _ = ax.get_legend_handles_labels()
strip_handles[-1].set_label(fly)
handles.append(strip_handles[-1])
ax.legend(handles=handles)
plt.show()

Matplotlib: How to plot line plots for multiple years with month and day on x axis?

I have a datarame similar to the following:
using the following code, I am able to plot a chart:
fig , ax = plt.subplots(nrows=1, ncols=1,figsize=(15,8))
# colors = {1:'red', 2:'green', 3:'blue', 4: 'yellow', 5:'aqua', 6:'salmon', 7:'plum', 8:'khaki', 9:'sienna', 10:'yellowgreen',
# 11:'cyan', 12: 'gold'}
months = euro_to_dollar["month"].unique()
for m in months:
ax.plot(euro_to_dollar[(euro_to_dollar["Time"].dt.strftime('%Y') == '2020') & (euro_to_dollar["month"]== m)]["dayofmonth"],
euro_to_dollar[(euro_to_dollar["Time"].dt.strftime('%Y') == '2020') & (euro_to_dollar["month"]== m)]["US_dollar"],
alpha = 0.5, label = m)
#color = euro_to_dollar[(euro_to_dollar["Time"].dt.strftime('%Y') == '2020') & (euro_to_dollar["month"]== m)]["month"].map(colors))
ax.grid(b=False)
ax.set_xticks(np.arange(1,len(euro_to_dollar["dayofmonth"].unique())))
ax.set_title("Month vs euro_dollar_rate Mean")
ax.legend(loc='best')
plt.show()
My questions:
I tried to manually type in the colors and try to use map function as below:
color = euro_to_dollar[(euro_to_dollar["Time"].dt.strftime('%Y') == '2020') & (euro_to_dollar["month"]== m)]["month"].map(colors)
but failed with error: ValueError: Invalid RGBA argument: 5375 red. Whats this error and how to handle this?
How to I customize the colors for each category dynamically? I don't want to manually type in the colors as below:
colors = {1:'red', 2:'green', 3:'blue', 4: 'yellow', 5:'aqua', 6:'salmon', 7:'plum', 8:'khaki', 9:'sienna', 10:'yellowgreen',
11:'cyan', 12: 'gold'}
I have seen in some plots where others were using different matplotlib colors by importing cm library from matplotlib. My question is how to assign different colors for N categories dynamically such that each category can be represented by its own color. I have seen others using numpy functions like this:
viridis = cm.get_cmap('viridis', 256)
newcolors = viridis(np.linspace(0, 1, 256))
pink = np.array([248/256, 24/256, 148/256, 1])
Thanks!
Solutions to questions
1. ValueError
The error seems to be caused by the fact that color is a pandas series object instead of a string or other valid color object. You can solve this by getting the appropriate string directly from the colors dictionary like this: color = colors[m].
2. Extracting colors from a colormap
The first section of the matplotlib tutorial Creating Colormaps in Matplotlib shows how to extract colors from the colormaps. As it is explained, there are two types of colormap objects in matplotlib (ListedColormap and LinearSegmentedColormap) which have partially different methods to extract the colors. Note that the documentation page displaying the built-in colormaps does not show what type of colormap object is used for each colormap. You get that information when calling the colormap:
plt.get_cmap('viridis')
# <matplotlib.colors.ListedColormap at 0x16b5859b790>
To get an overview table of all colormaps and their object type, you can run this (see also this question):
for cmap in plt.colormaps():
print(f'{cmap:<20} {str(plt.get_cmap(cmap)).split(".")[-1].split()[0]}')
To answer your question, here is a method to extract a list of colors of the number you want from both types of colormaps consistently which you can then access using the month number minus 1 as index:
months = euro_to_dollar['month'].unique()
cmap = plt.get_cmap('any_colormap_name')
colors = cmap(np.linspace(0, 1, len(months)))
for m in months:
ax.plot(...,
color = colors[m-1])
Suggestion: use seaborn lineplot function
The code can be simplified by using the lineplot function of the seaborn package. You can choose any matplotlib colormap with palette and activate coloring according to the months with hue='month'. Here is a complete example.
Create sample dataset
import numpy as np # v 1.20.2
import pandas as pd # v 1.2.5
import matplotlib.pyplot as plt # v 3.3.4
import seaborn as sns # v 0.11.1
rng = np.random.default_rng(seed=123) # random number generator
bdate = pd.bdate_range(start='2019-01-01', end='2021-07-24')
daily_value_change = rng.normal(loc=0, scale=0.005, size=bdate.size)
value = 1.1 + np.cumsum(daily_value_change)
euro_to_dollar = pd.DataFrame(dict(Time=bdate, US_dollar=value))
euro_to_dollar['year'] = euro_to_dollar['Time'].dt.year
euro_to_dollar['month'] = euro_to_dollar['Time'].dt.month
euro_to_dollar['dayofmonth'] = euro_to_dollar['Time'].dt.day
euro_to_dollar.head()
Select year and plot data with seaborn
euro_to_dollar_2020 = euro_to_dollar[euro_to_dollar['year'] == 2020]
ax = sns.lineplot(data=euro_to_dollar_2020, x='dayofmonth', y='US_dollar',
hue='month', palette='viridis')
ax.figure.set_size_inches(10, 6)
ax.set_xticks(euro_to_dollar['dayofmonth'].unique())
ax.set_title('Month vs euro_dollar_rate Mean')
ax.legend(ax.lines, euro_to_dollar['month'].unique(), title='month')
plt.show()

Matplotlib histogram with errorbars

I have created a histogram with matplotlib using the pyplot.hist() function. I would like to add a Poison error square root of bin height (sqrt(binheight)) to the bars. How can I do this?
The return tuple of .hist() includes return[2] -> a list of 1 Patch objects. I could only find out that it is possible to add errors to bars created via pyplot.bar().
Indeed you need to use bar. You can use to output of hist and plot it as a bar:
import numpy as np
import pylab as plt
data = np.array(np.random.rand(1000))
y,binEdges = np.histogram(data,bins=10)
bincenters = 0.5*(binEdges[1:]+binEdges[:-1])
menStd = np.sqrt(y)
width = 0.05
plt.bar(bincenters, y, width=width, color='r', yerr=menStd)
plt.show()
Alternative Solution
You can also use a combination of pyplot.errorbar() and drawstyle keyword argument. The code below creates a plot of the histogram using a stepped line plot. There is a marker in the center of each bin and each bin has the requisite Poisson errorbar.
import numpy
import pyplot
x = numpy.random.rand(1000)
y, bin_edges = numpy.histogram(x, bins=10)
bin_centers = 0.5*(bin_edges[1:] + bin_edges[:-1])
pyplot.errorbar(
bin_centers,
y,
yerr = y**0.5,
marker = '.',
drawstyle = 'steps-mid-'
)
pyplot.show()
My personal opinion
When plotting the results of multiple histograms on the the same figure, line plots are easier to distinguish. In addition, they look nicer when plotting with a yscale='log'.