How to start Seaborn Logarithmic Barplot at y=1 - pandas

I have a problem figuring out how to have Seaborn show the right values in a logarithmic barplot. A value of mine should be, in the ideal case, be 1. My dataseries (5,2,1,0.5,0.2) has a set of values that deviate from unity and I want to visualize these in a logarithmic barplot. However, when plotting this in the standard log-barplot it shows the following:
But the values under one are shown to increase from -infinity to their value, whilst the real values ought to look like this:
Strangely enough, I was unable to find a Seaborn, Pandas or Matplotlib attribute to "snap" to a different horizontal axis or "align" or ymin/ymax. I have a feeling I am unable to find it because I can't find the terms to shove down my favorite search engine. Some semi-solutions I found just did not match what I was looking for or did not have either xaxis = 1 or a ylog. A try that uses some jank Matplotlib lines:
If someone knows the right terms or a solution, thank you in advance.
Here are the Jupyter cells I used:
{1}
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
data = {'X': ['A','B','C','D','E'], 'Y': [5,2,1,0.5,0.2]}
df = pd.DataFrame(data)
{2}
%matplotlib widget
g = sns.catplot(data=df, kind="bar", y = "Y", x = "X", log = True)
{3}
%matplotlib widget
plt.vlines(x=data['X'], ymin=1, ymax=data['Y'])

You could let the bars start at 1 instead of at 0. You'll need to use sns.barplot directly.
The example code subtracts 1 of all y-values and sets the bar bottom at 1.
import matplotlib.pyplot as plt
from matplotlib.ticker import NullFormatter
import seaborn as sns
import pandas as pd
import numpy as np
data = {'X': ['A', 'B', 'C', 'D', 'E'], 'Y': [5, 2, 1, 0.5, 0.2]}
df = pd.DataFrame(data)
ax = sns.barplot(y=df["Y"] - 1, x=df["X"], bottom=1, log=True, palette='flare_r')
ax.axhline(y=1, c='k')
# change the y-ticks, as the default shows too few in this case
ax.set_yticks(np.append(np.arange(.2, .8, .1), np.arange(1, 7, 1)), minor=False)
ax.set_yticks(np.arange(.3, 6, .1), minor=True)
ax.yaxis.set_major_formatter(lambda x, pos: f'{x:.0f}' if x >= 1 else f'{x:.1f}')
ax.yaxis.set_minor_formatter(NullFormatter())
ax.bar_label(ax.containers[0], labels=df["Y"])
sns.despine()
plt.show()
PS: With these specific values, the plot might go without logscale:

Related

How make scatterplot in pandas readable

I've been playing with Titanic dataset and working through some visualisations in Pandas using this tutorial. https://www.kdnuggets.com/2023/02/5-pandas-plotting-functions-might-know.html
I have a visual of scatterplot having used this code.
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('train.csv')
I was confused by bootstrap plot result so went on to scatterplot.
pd.plotting.scatter_matrix(df, figsize=(10,10), )
plt.show()
I can sort of interpret it but I'd like to put the various variables at top and bottom of every column. Is that doable?
You can use:
fig, ax = plt.subplots(4, 3, figsize=(20, 15))
sns.scatterplot(x = 'bedrooms', y = 'price', data = dataset, whis=1.5, ax=ax[0, 0])
sns.scatterplot(x = 'bathrooms', y = 'price', data = dataset, whis=1.5, ax=ax[0, 1])

Seaborn swarmplot marker colors

I have my data stored in the following pandas dataframe df.
I am using the following command to plot a box and swarmplot superimposed.
hue_plot_params = {'data': df,'x': 'Genotype','y': 'centroid velocity','hue': 'Surface'}
sns.boxplot(**hue_plot_params)
sns.swarmplot(**hue_plot_params)
However, there is one more category 'Fly' which I would also like to plot as different marker colors in my swarmplot. As the data in each column is made up of multiple 'Fly' types, I would like to have multiple colors for the corresponding markers instead of blue and orange.
(ignore the significance bars, those have been made with another command)
Could someone help me with how to do this?
The following approach iterates through the 'Fly' categories. A swarmplot needs to be created in one go, but a stripplot could be drawn one by one on top of each other.
'Surface' is still used as hue to get them plotted as "dodged", but the palette uses the color for the fly category two times.
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
sns.set()
np.random.seed(2022)
df = pd.DataFrame({'Genotype': np.repeat(['SN1_TNT', 'SN2_TNT', 'SN3_TNT', 'SN4_TNT'], 100),
'centroid velocity': 20 - np.random.normal(0.01, 1, 400).cumsum(),
'Surface': np.tile(np.repeat(['smooth', 'rough'], 50), 4),
'Fly': np.random.choice(['Fly1', 'Fly2', 'Fly3'], 400)})
hue_plot_params = {'x': 'Genotype', 'y': 'centroid velocity', 'hue': 'Surface', 'hue_order': ['smooth', 'rough']}
ax = sns.boxplot(data=df, **hue_plot_params)
handles, _ = ax.get_legend_handles_labels() # get the legend handles for the boxplots
fly_list = sorted(df['Fly'].unique())
for fly, color in zip(fly_list, sns.color_palette('pastel', len(fly_list))):
sns.stripplot(data=df[df['Fly'] == fly], **hue_plot_params, dodge=True,
palette=[color, color], label=fly, ax=ax)
strip_handles, _ = ax.get_legend_handles_labels()
strip_handles[-1].set_label(fly)
handles.append(strip_handles[-1])
ax.legend(handles=handles)
plt.show()

Matplotlib to Create histogram by Row

I have three arrays that essentially correspond to a matrix of gene expression values and then column labels specifying condition IDs and row values specifying a specific gene. I'm trying to define a function that will plot a histogram by just providing the gene name.
Basically I need to specify YAL001C and create a histogram of the values across the row. I'm very new to matplotlib and I'm not sure how do this. Would it have something to do with using something like an np.where(gene = YAL001C) argument? I guess I'm just not sure where that would fit into code for matplotlib.
I currently have the following code, but it doesn't work:
def histogram(gene):
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
x = np.where(geneList == gene, exprMat)
bins = 50
ax.hist(x, bins, color = 'green', edgecolor = 'black', alpha = 0.8 )
plt.show()
In case you want to avoid using pandas, you can still accomplish what you want using numpy, but you need to add some codes to figure out what row corresponds to a given gene. Here is one of the ways you could code it:
import numpy as np
import matplotlib.pyplot as plt
data = np.array([[0.15, -0.22, 0.07],
[-0.07, -0.76, -0.12],
[-1.22, -0.27, -0.1],
[-0.09, 1.2, 0.16]
])
def plot_hist(gene):
list_genes = ['YAL001C', 'YAL002W', 'YAL003W', 'YAL004W']
if gene in list_genes:
sn_gene = list_genes.index(gene)
else:
print(f'{gene} is not in the list of genes')
return
fig, ax = plt.subplots(figsize=(6,4))
plt.hist(data[sn_gene,:])
plt.title(f'gene: {gene}')
plt.show()
plot_hist('YAL001C')
Here is one of the ways you could accomplish that (passing the data related to the corresponding row to the method):
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = np.array([[0.15, -0.22, 0.07],
[-0.07, -0.76, -0.12],
[-1.22, -0.27, -0.1],
[-0.09, 1.2, 0.16]
])
df = pd.DataFrame(data=data,
index=['YAL001C', 'YAL002W', 'YAL003W', 'YAL004W'],
columns=['cln3-1', 'cln3-2', 'clb'])
print(df)
def plot_hist(gene):
fig, ax = plt.subplots(1,2, figsize=(9,4))
ax[0].bar(df.columns, df.loc[gene])
ax[1].hist(df.loc[gene])
plt.show()
plot_hist('YAL001C')
Left: bar-plot, Right: histogram

how to plot lines linking medians of multiple violin distributions in seaborn?

I struggle hard to succeed in plotting a dot-line between the median values (and min and max) per type of stacked violin distributions.
I tried superposing a violin plot with a seaborn.lineplot but it failed. I'm not sure with this approach that I can draw dot-lines and also link min and max of distributions of the same type. I also tried to use seaborn.lineplot but here the challenge is to plot min and max of the distribution at each x-axis value.
Here is a example dataset and the code for the violin plot in seaborn
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
x=[0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.2,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.4,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.6,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8,0.8]
cate=['a','a','a','a','b','b','b','b','c','c','c','c','a','a','a','a','b','b','b','b','c','c','c','c','a','a','a','a','b','b','b','b','c','c','c','c','a','a','a','a','b','b','b','b','c','c','c','c']
y=[1.1,1.12,1.13,1.13,3.1,3.12,3.13,3.13,5.1,5.12,5.13,5.13,2.2,2.22,2.25,2.23,4.2,4.22,4.25,4.23,6.2,6.22,6.25,6.23,2.2,2.22,2.24,2.23,4.2,4.22,4.24,4.23,6.2,6.22,6.24,6.23,1.1,1.13,1.14,1.12,3.1,3.13,3.14,3.12,5.1,5.13,5.14,5.12]
my_pal =['red','green', 'purple']
df = pd.DataFrame({'x': x, 'Type': cate, 'y': y})
ax=sns.catplot(y='y', x='x',data=df, hue='Type', palette=my_pal, kind="violin",dodge =False)
sns.lineplot(y='y', x='x',data=df, hue='Type', palette=my_pal, ci=100,legend=False)
plt.show()
but it plots line only on a reduce part of the left of the plot. Is there a trick to superpose lineplot with violin plot?
For the line plot, 'x' is considered numerical. However, for the violin plot 'x' is considered categorical (positioned at 0, 1, 2, ...).
A solution is to convert 'x' to strings to have both plots consider it as categorical.
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
my_pal = ['red', 'green', 'purple']
N = 40
df = pd.DataFrame({'x': np.random.randint(1, 6, N*3) * 0.2,
'y': np.random.uniform(0, 1, N*3) + np.tile([2, 4, 6], N),
'Type': np.tile(list('abc'), N)})
df['x'] = [f'{x:.1f}' for x in df['x']]
ax = sns.violinplot(y='y', x='x', data=df, hue='Type', palette=my_pal, dodge=False)
ax = sns.lineplot(y='y', x='x', data=df, hue='Type', palette=my_pal, ci=100, legend=False, ax=ax)
ax.margins(0.15) # slightly more padding for x and y axis
ax.legend(bbox_to_anchor=(1.01, 1), loc='upper left')
plt.tight_layout()
plt.show()

"panel barchart" in matplotlib

I would like to produce a figure like this one using matplotlib:
(source: peltiertech.com)
My data are in a pandas DataFrame, and I've gotten as far as a regular stacked barchart, but I can't figure out how to do the part where each category is given its own y-axis baseline.
Ideally I would like the vertical scale to be exactly the same for all the subplots and move the panel labels off to the side so there can be no gaps between the rows.
I haven't exactly replicated what you want but this should get you pretty close.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
#create dummy data
cols = ['col'+str(i) for i in range(10)]
ind = ['ind'+str(i) for i in range(10)]
df = pd.DataFrame(np.random.normal(loc=10, scale=5, size=(10, 10)), index=ind, columns=cols)
#create plot
sns.set_style("whitegrid")
axs = df.plot(kind='bar', subplots=True, sharey=True,
figsize=(6, 5), legend=False, yticks=[],
grid=False, ylim=(0, 14), edgecolor='none',
fontsize=14, color=[sns.xkcd_rgb["brownish red"]])
plt.text(-1, 100, "The y-axis label", fontsize=14, rotation=90) # add a y-label with custom positioning
sns.despine(left=True) # get rid of the axes
for ax in axs: # set the names beside the axes
ax.lines[0].set_visible(False) # remove ugly dashed line
ax.set_title('')
sername = ax.get_legend_handles_labels()[1][0]
ax.text(9.8, 5, sername, fontsize=14)
plt.suptitle("My panel chart", fontsize=18)