pandas scatter plot and groupby does not work - pandas

I am trying to do a scatter plot with pandas. Unfortunately kind='scatter' doesn't work. If I change this to kind='line' it works as expected. What can I do to fix this?
for label, d in df.groupby('m'):
d[['te','n']].sort_values(by='n', ascending=False).plot(kind="scatter", x='n', y='te', ax=ax, label='m = '+str(label))```

Use plot.scatter instead:
df = pd.DataFrame({'x': [0, 5, 7,3, 2, 4, 6], 'y': [0, 5, 7,3, 2, 4, 6]})
df.plot.scatter('x', 'y')
Use this snippet if you want individual labels and colours:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({
'm': np.random.randint(0, 5, size=100),
'x': np.random.uniform(size=100),
'y': np.random.uniform(size=100),
})
fig, ax = plt.subplots()
for label, d in df.groupby('m'):
# generate a random color:
color = list(np.random.uniform(size=3))
d.plot.scatter('x', 'y', label=f'group {label}', ax=ax, c=[color])

Related

matplotlib - plot merged dataframe with group bar

I try to plot a grouped bar chart from a merged dataframe. below code the bar is stacked, how can I put it side by side just like a grouped bar chart?
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(0)
df1 = pd.DataFrame({
'key': ['A', 'B', 'C', 'D'],
'value':[ 10 ,6, 6, 8]})
df2 = pd.DataFrame({
'key': ['B', 'D', 'A', 'F'],
'value':[ 3, 5, 5, 7]})
df3 = pd.merge(df1, df2, how='inner', on=['key'])
print(df1)
print(df2)
print(df3)
fig, ax = plt.subplots(figsize=(12, 8))
b1 = ax.bar(df3['key'],df3['value_x'])
b2 = ax.bar(df3['key'],df3['value_y'])
pngname = "demo.png"
fig.savefig(pngname, dpi=fig.dpi)
print("[[./%s]]"%(pngname))
Current output:
The problem is that the x axis data is the same, in your case it aren't numbers, it are the keys: "A", "B", "C". So matplotlib stacks them one onto another.
There's a simple way around it, as some tutorials online show https://www.geeksforgeeks.org/create-a-grouped-bar-plot-in-matplotlib/.
So, what you do is basically enumerate the keys, i.e. A=1, B=2, C=3. After this, choose your desired bar width, I chose 0.4 for example. And now, shift one group of bars to the left by bar_width/2, and shift the other one to the right by bar_width/2.
Perhaps the code explains it better than I did:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(0)
df1 = pd.DataFrame({
'key': ['A', 'B', 'C', 'D'],
'value':[ 10 ,6, 6, 8]})
df2 = pd.DataFrame({
'key': ['B', 'D', 'A', 'F'],
'value':[ 3, 5, 5, 7]})
df3 = pd.merge(df1, df2, how='inner', on=['key'])
fig, ax = plt.subplots(figsize=(12, 8))
# modifications
x = np.arange(len(df3['key'])) # enumerate the keys
bar_width = 0.4 # choose bar length
b1 = ax.bar(x - bar_width/2,df3['value_x'], width=bar_width, label='value_x') # shift x values left
b2 = ax.bar(x + bar_width/2,df3['value_y'], width=bar_width, label='value_y') # shift x values right
plt.xticks(x, df3['key']) # replace x axis ticks with keys from df3.
plt.legend(['value_x', 'value_y'])
plt.show()
Result:

Seaborn: annotate missing values on the heatmap

I am plotting a heatmap in python with the seaborn library. The dataframe contains some missing values (NaN). I wish that the heatmap cells corresponding to these fields are white (by default) and also annotated with a string NA. However, if I see it correctly, annotation does not work with missing values. Is there any hack around it?
My code:
sns.heatmap(
df,
ax=ax[0, 0],
cbar=False,
annot=annot_df,
fmt="",
annot_kws={"size": annot_size, "va": "center_baseline"},
cmap="coolwarm",
linewidth=0.5,
linecolor="black",
vmin=-max_value,
vmax=max_value,
xticklabels=True,
yticklabels=True,
)
An idea is to draw another heatmap, with a transparent color and with only values where the original dataframe is NaN. To control the axis labels, the "real" heatmap should be drawn last. Note that the color for the NaN cells is the background color of the plot.
import seaborn as sns
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from matplotlib.colors import ListedColormap
data = np.where(np.random.rand(7, 10) < 0.2, np.nan, np.random.rand(7, 10) * 2 - 1)
df = pd.DataFrame(data)
annot_df = df.applymap(lambda f: f'{f:.1f}')
fig, ax = plt.subplots(squeeze=False)
sns.heatmap(
np.where(df.isna(), 0, np.nan),
ax=ax[0, 0],
cbar=False,
annot=np.full_like(df, "NA", dtype=object),
fmt="",
annot_kws={"size": 10, "va": "center_baseline", "color": "black"},
cmap=ListedColormap(['none']),
linewidth=0)
sns.heatmap(
df,
ax=ax[0, 0],
cbar=False,
annot=annot_df,
fmt="",
annot_kws={"size": 10, "va": "center_baseline"},
cmap="coolwarm",
linewidth=0.5,
linecolor="black",
vmin=-1,
vmax=1,
xticklabels=True,
yticklabels=True)
plt.show()
PS: To explicitly color the 'NA' cells, e.g. cmap=ListedColormap(['yellow']) could be used.

Coloring minimum bars in seaborn FacetGrid barplot

Any easy way to automatically color (or mark in any way) the minimum/maximum bars for each plot of a FacetGrid?
For example, how to mark the minimal Z value on each one of the following 16 plots?
df = pd.DataFrame({'A':[10, 20, 30, 40]*4, 'Y':[1,2,3,4]*4, 'W':range(16), 'Z':range(16)})
g = sns.FacetGrid(df, row="A", col="Y", sharey=False)
g.map(sns.barplot, "W", "Z")
plt.show()
The following approach loops through the diagonal axes, for each ax searches the minimum height of the bars and then colors those:
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns
df = pd.DataFrame({'A': [10, 20, 30, 40] * 4, 'Y': [1, 2, 3, 4] * 4, 'W': range(16), 'Z': range(16)})
g = sns.FacetGrid(df, row="A", col="Y", sharey=False)
g.map(sns.barplot, "W", "Z")
for i in range(len(g.axes)):
ax = g.axes[i, i]
min_height = min([p.get_height() for p in ax.patches])
for p in ax.patches:
if p.get_height() == min_height:
p.set_color('red')
plt.tight_layout()
plt.show()

Error while adding error bars to subplots in seaborn

I have the following example code which I want to plot as bar subplots using seaborn in one figure. I can plot the actual data as bar plots but when i try to add error bars, i get the following error:
AttributeError: 'NoneType' object has no attribute 'seq'
code is:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
df1 = pd.DataFrame({
'A': ['7.5'],
'B': ['2.4']
})
df1_err = pd.DataFrame({
'A': ['2.3'],
'B': ['1.2']
})
df2 = pd.DataFrame({
'A': ['5.5'],
'B': ['4.2']
})
df2_err = pd.DataFrame({
'A': ['1.7'],
'B': ['2.1']
})
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(6, 4), sharey=True)
my_pal = {"A": "green", "B":"orange"}
sns.set_style("whitegrid")
plt.tight_layout()
sns.barplot(data=df1, palette=my_pal, yerr = df1_err, linewidth=2,edgecolor=[".1","0.1"], ax=axes[0])
sns.barplot(data=df2, palette=my_pal, yerr = df2_err, linewidth=2,edgecolor=[".1","0.1"], ax=axes[1])
plt.show()
If I remove yerr from the sns.barplot() commands, it does create bar plots as I want, but I could not manage to add pre-calculated error bars to these subplots. Any help please?
Maybe you mean something like this:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
df1 = pd.DataFrame({
'A': ['7.5'],
'B': ['2.4']
}).astype(float)
df1_err = pd.DataFrame({
'A': ['2.3'],
'B': ['1.2']
}).astype(float)
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(6, 4), sharey=True)
axes[0].bar(df1.T.index.values, np.squeeze(df1.T.values), yerr=np.squeeze(df1_err.T.values))
plt.show()

plot all columns of a pandas dataframe with matplotlib

I have a dataframe with a datetime-index and 65 columns.
And I want to plot all these colums among themselves,
not in a grid or in one figure.
df= test[['one', 'two', 'three','four']]
fig, ax = plt.subplots(figsize=(24, 5))
df.plot(ax=ax)
plt.show()
The example is all in one plot and not all among themselves.
You can loop over the columns of the DataFrame and create a new figure for each column. This will plot them all at once. If you want the next one to show up once the previous one is closed then put the call to plt.show() inside the for loop.
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({
'one': [1, 3, 2, 5],
'two': [9, 6, 4, 3],
'three': [0, 1, 1, 0],
'four': [-3, 2, -1, 0]})
for i, col in enumerate(df.columns):
df[col].plot(fig=plt.figure(i))
plt.title(col)
plt.show()