seaborn or matplotlib line chart, line color depending on variable - pandas

I have a pandas dataframe with three columns, Date(timestamp), Color('red' or 'blue') and Value(int).
I am currently getting a line chart from it with the following code:
import matplotlib.pyplot as plt
import pandas as pd
Dates=['01/01/2014','02/01/2014','03/01/2014','04/01/2014','05/01/2014','06/01/2014','07/01/2014']
Values=[3,4,6,5,4,5,4]
Colors=['red','red','blue','blue','blue','red','red']
df=pd.DataFrame({'Dates':Dates,'Values':Values,'Colors':Colors})
df['Dates']=pd.to_datetime(df['Dates'],dayfirst=True)
grouped = df.groupby('Colors')
fig, ax = plt.subplots()
for key, group in grouped:
group.plot(ax=ax, x="Dates", y="Values", label=key, color=key)
plt.show()
I'd like the line color to depend on the 'color' columns. How can I achieve that?
I have seen here a similar question for scatterplots, but it doesn't seem I can apply the same solution to a time series line chart.
My output is currently this:
I am trying to achieve something like this (one line only, but several colors)

As I said you could find the answer from the link I attached in the comment:
Dates = ['01/01/2014', '02/01/2014', '03/01/2014', '03/01/2014', '04/01/2014', '05/01/2014']
Values = [3, 4, 6, 6, 5, 4]
Colors = ['red', 'red', 'red', 'blue', 'blue', 'blue']
df = pd.DataFrame({'Dates': Dates, 'Values': Values, 'Colors': Colors})
df['Dates'] = pd.to_datetime(df['Dates'], dayfirst=True)
grouped = df.groupby('Colors')
fig, ax = plt.subplots(1)
for key, group in grouped:
group.plot(ax=ax, x="Dates", y="Values", label=key, color=key)
When color changing you need to add extra point to make line continuous

Related

Python pyplot scatter is not using colors

I am trying to plot a scatter chart with pandas and matplotlib.pylot. The dots in the graph are only using one color, while the legend is showing there are three different colors for three different groups of data.
Below is my code and a copy of screen shot. You can see that only all dots are in green color. Could anyone point me why? What did I do wrong?
Thanks a lot in advance.
import pandas as pd
import matplotlib.pyplot as plt
data = {
'x':[1,2,3,4,1,3,7,5],
'y':[10, 20, 30, 40, 20, 30, 40, 80],
'label':['A', 'A','B','B','A','C','C','A']
}
df = pd.DataFrame(data)
plt.figure(figsize=(34,8))
fig,ax = plt.subplots()
#sns.scatterplot(data=df, hue='label', x='x', y='y')
for k, d in df.groupby('label'):
ax.scatter(df['x'], df['y'], label=k)
plt.legend()
plt.show()
You need to add colors mapping. Slight modifications to your code after adding colors dictionary:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
data = {
'x':[1,2,3,4,1,3,7,5],
'y':[10, 20, 30, 40, 20, 30, 40, 80],
'label':['A', 'A','B','B','A','C','C','A']
}
df = pd.DataFrame(data)
#plt.figure(figsize=(34,8))
fig,ax = plt.subplots()
df1 = df.groupby('label')
colors = iter(cm.rainbow(np.linspace(0, 1, len(df1.groups))))
for k, d in df1:
ax.scatter(d['x'], d['y'], label=k, color=next(colors))
plt.legend()
plt.show()
outputs the scatter plot as:
Is this your desired output?

Matplotlib to Create histogram by Row

I have three arrays that essentially correspond to a matrix of gene expression values and then column labels specifying condition IDs and row values specifying a specific gene. I'm trying to define a function that will plot a histogram by just providing the gene name.
Basically I need to specify YAL001C and create a histogram of the values across the row. I'm very new to matplotlib and I'm not sure how do this. Would it have something to do with using something like an np.where(gene = YAL001C) argument? I guess I'm just not sure where that would fit into code for matplotlib.
I currently have the following code, but it doesn't work:
def histogram(gene):
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
x = np.where(geneList == gene, exprMat)
bins = 50
ax.hist(x, bins, color = 'green', edgecolor = 'black', alpha = 0.8 )
plt.show()
In case you want to avoid using pandas, you can still accomplish what you want using numpy, but you need to add some codes to figure out what row corresponds to a given gene. Here is one of the ways you could code it:
import numpy as np
import matplotlib.pyplot as plt
data = np.array([[0.15, -0.22, 0.07],
[-0.07, -0.76, -0.12],
[-1.22, -0.27, -0.1],
[-0.09, 1.2, 0.16]
])
def plot_hist(gene):
list_genes = ['YAL001C', 'YAL002W', 'YAL003W', 'YAL004W']
if gene in list_genes:
sn_gene = list_genes.index(gene)
else:
print(f'{gene} is not in the list of genes')
return
fig, ax = plt.subplots(figsize=(6,4))
plt.hist(data[sn_gene,:])
plt.title(f'gene: {gene}')
plt.show()
plot_hist('YAL001C')
Here is one of the ways you could accomplish that (passing the data related to the corresponding row to the method):
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = np.array([[0.15, -0.22, 0.07],
[-0.07, -0.76, -0.12],
[-1.22, -0.27, -0.1],
[-0.09, 1.2, 0.16]
])
df = pd.DataFrame(data=data,
index=['YAL001C', 'YAL002W', 'YAL003W', 'YAL004W'],
columns=['cln3-1', 'cln3-2', 'clb'])
print(df)
def plot_hist(gene):
fig, ax = plt.subplots(1,2, figsize=(9,4))
ax[0].bar(df.columns, df.loc[gene])
ax[1].hist(df.loc[gene])
plt.show()
plot_hist('YAL001C')
Left: bar-plot, Right: histogram

How to start Seaborn Logarithmic Barplot at y=1

I have a problem figuring out how to have Seaborn show the right values in a logarithmic barplot. A value of mine should be, in the ideal case, be 1. My dataseries (5,2,1,0.5,0.2) has a set of values that deviate from unity and I want to visualize these in a logarithmic barplot. However, when plotting this in the standard log-barplot it shows the following:
But the values under one are shown to increase from -infinity to their value, whilst the real values ought to look like this:
Strangely enough, I was unable to find a Seaborn, Pandas or Matplotlib attribute to "snap" to a different horizontal axis or "align" or ymin/ymax. I have a feeling I am unable to find it because I can't find the terms to shove down my favorite search engine. Some semi-solutions I found just did not match what I was looking for or did not have either xaxis = 1 or a ylog. A try that uses some jank Matplotlib lines:
If someone knows the right terms or a solution, thank you in advance.
Here are the Jupyter cells I used:
{1}
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
data = {'X': ['A','B','C','D','E'], 'Y': [5,2,1,0.5,0.2]}
df = pd.DataFrame(data)
{2}
%matplotlib widget
g = sns.catplot(data=df, kind="bar", y = "Y", x = "X", log = True)
{3}
%matplotlib widget
plt.vlines(x=data['X'], ymin=1, ymax=data['Y'])
You could let the bars start at 1 instead of at 0. You'll need to use sns.barplot directly.
The example code subtracts 1 of all y-values and sets the bar bottom at 1.
import matplotlib.pyplot as plt
from matplotlib.ticker import NullFormatter
import seaborn as sns
import pandas as pd
import numpy as np
data = {'X': ['A', 'B', 'C', 'D', 'E'], 'Y': [5, 2, 1, 0.5, 0.2]}
df = pd.DataFrame(data)
ax = sns.barplot(y=df["Y"] - 1, x=df["X"], bottom=1, log=True, palette='flare_r')
ax.axhline(y=1, c='k')
# change the y-ticks, as the default shows too few in this case
ax.set_yticks(np.append(np.arange(.2, .8, .1), np.arange(1, 7, 1)), minor=False)
ax.set_yticks(np.arange(.3, 6, .1), minor=True)
ax.yaxis.set_major_formatter(lambda x, pos: f'{x:.0f}' if x >= 1 else f'{x:.1f}')
ax.yaxis.set_minor_formatter(NullFormatter())
ax.bar_label(ax.containers[0], labels=df["Y"])
sns.despine()
plt.show()
PS: With these specific values, the plot might go without logscale:

Draw semicircle chart using matplotlib

Is matplotlib capable of creating semicircle charts like this:
I have tried matplotlib.pyplot.pie without success.
It doesn't seem like there is a built-in half-circle type in matplotlib. However, a workaround can be made based on matplotlib.pyplot.pie:
Append the total sum of the data and assign white color to it.
Overlay a white circle in the center by an Artist object (reference).
Sample Code:
import matplotlib.pyplot as plt
# data
label = ["A", "B", "C"]
val = [1,2,3]
# append data and assign color
label.append("")
val.append(sum(val)) # 50% blank
colors = ['red', 'blue', 'green', 'white']
# plot
fig = plt.figure(figsize=(8,6),dpi=100)
ax = fig.add_subplot(1,1,1)
ax.pie(val, labels=label, colors=colors)
ax.add_artist(plt.Circle((0, 0), 0.6, color='white'))
fig.show()
Output:
My solution:
import matplotlib.pyplot as plt
# data
label = ["A", "B", "C"]
val = [1,2,3]
# append data and assign color
label.append("")
val.append(sum(val)) # 50% blank
colors = ['red', 'blue', 'green', 'k']
# plot
plt.figure(figsize=(8,6),dpi=100)
wedges, labels=plt.pie(val, wedgeprops=dict(width=0.4,edgecolor='w'),labels=label, colors=colors)
# I tried this method
wedges[-1].set_visible(False)
plt.show()
Output:
enter image description here

seaborn exclude columns in clustering

I have a dataset containing 200 rows and 97 columns, which I have stored as a pandas dataframe.
I am plotting this dataframe with seaborn, using clustermap, like this:
from matplotlib.colors import ListedColormap
sns.set(rc={'axes.facecolor':'white', 'figure.facecolor':'white'})
cmap=ListedColormap(["white", "lightgray", "blue", "red", "cornflowerblue", "darkcyan", "pink", "violet"])
g = sns.clustermap(df,method="complete", metric="hamming",row_cluster=True,
col_cluster=False, figsize=(10, 20), cmap=cmap)
plt.setp(g.ax_heatmap.get_yticklabels(), rotation=0)
plt.show()
However, I have just realized that I would like to plot it just like this, but I do not want the two first columns of my data frame to be included in the distance calculations.
Suggestions of how I can accomplish this?
Thanks!
use iloc to do your trimming
from matplotlib.colors import ListedColormap
sns.set(rc={'axes.facecolor':'white', 'figure.facecolor':'white'})
cmap=ListedColormap(["white", "lightgray", "blue", "red", "cornflowerblue", "darkcyan", "pink", "violet"])
g = sns.clustermap(
df.iloc[:, 2:], method="complete", metric="hamming", row_cluster=True,
col_cluster=False, figsize=(10, 20), cmap=cmap)
plt.setp(g.ax_heatmap.get_yticklabels(), rotation=0)
plt.show()