Stacked Bar Graph with Errorbars in Pandas / Matplotlib - pandas

I want to show my Data in two (or more) stacked Bargraphs inkluding Errorbars. My Code leans on an working Example, but uses df`s at input instead of Arrays.
I tried to set the df-output to an array, but this will not work
from uncertain_panda import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
raw_data = {'': ['Error', 'Value'],'Stars': [3, 18],'Cats': [2,15],'Planets': [1,12],'Dogs': [2,16]}
df = pd.DataFrame(raw_data)
df.set_index('', inplace=True)
print(df)
N = 2
ind = np.arange(N)
width = 0.35
first_Value = df.loc[['Value'],['Cats','Dogs']]
second_Value = df.loc[['Value'],['Stars','Planets']]
first_Error = df.loc[['Error'],['Cats','Dogs']]
second_Error = df.loc[['Error'],['Stars','Planets']]
p1 = plt.bar(ind, first_Value, width, yerr=first_Error)
p2 = plt.bar(ind, second_Value, width, yerr=second_Error, bottom=first_Value)
plt.xticks(ind, ('Pets', 'Universe'))
plt.legend((p1[0], p2[0]), ('Cats', 'Dogs', 'Stars', 'Planets'))
plt.show()
I expect an output like this:
https://matplotlib.org/3.1.0/gallery/lines_bars_and_markers/bar_stacked.html#sphx-glr-gallery-lines-bars-and-markers-bar-stacked-py
Instead i get this error:
TypeError: only size-1 arrays can be converted to Python scalars

Related

How make scatterplot in pandas readable

I've been playing with Titanic dataset and working through some visualisations in Pandas using this tutorial. https://www.kdnuggets.com/2023/02/5-pandas-plotting-functions-might-know.html
I have a visual of scatterplot having used this code.
import pandas as pd
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv('train.csv')
I was confused by bootstrap plot result so went on to scatterplot.
pd.plotting.scatter_matrix(df, figsize=(10,10), )
plt.show()
I can sort of interpret it but I'd like to put the various variables at top and bottom of every column. Is that doable?
You can use:
fig, ax = plt.subplots(4, 3, figsize=(20, 15))
sns.scatterplot(x = 'bedrooms', y = 'price', data = dataset, whis=1.5, ax=ax[0, 0])
sns.scatterplot(x = 'bathrooms', y = 'price', data = dataset, whis=1.5, ax=ax[0, 1])

Matplotlib - Line Plot

I am trying to plot an array of 101 rows * 12 Columns, with row #1 as a highlight using the code below:
plt.plot(HW.transpose()[1:101],color = 'grey', alpha = 0.1)
plt.plot(HW.transpose()[0],color = 'red', linewidth = 3, alpha = 0.7)
The only issue in this graph is that 'S1' somehow ends up in the last instead of beginning. What am I doing wrong?
HW.transpose()[1:101] doesn't select the desired columns. You can use HW.transpose().iloc[:, 1:101] instead:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
HW = pd.DataFrame(np.random.randn(101, 12).cumsum(axis=1), columns=[f'S{i}' for i in range(1, 13)])
plt.plot(HW.transpose().iloc[:, 1:101], color='grey', alpha=0.1)
plt.plot(HW.transpose().iloc[:, 0], color='red', linewidth=3, alpha=0.7)
plt.show()

Use center diverging colormap in a pandas dataframe heatmap display

I would like to use a diverging colormap to color the background of a pandas dataframe. The aspect that makes this trickier than one would think is the centering. In the example below, a red to blue colormap is used, but the middle of the colormap isn't used for values around zero. How to create a centered background color display where zero is white, all negatives are a red hue, and all positives are a blue hue?
import pandas as pd
import numpy as np
import seaborn as sns
np.random.seed(24)
df = pd.DataFrame()
df = pd.concat([df, pd.DataFrame(np.random.randn(10, 4)*10, columns=list('ABCD'))],
axis=1)
df.iloc[0, 2] = 0.0
cm = sns.diverging_palette(5, 250, as_cmap=True)
df.style.background_gradient(cmap=cm).set_precision(2)
The zero in the above display has a red hue and the closest to white background is used for a negative number.
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib import colors
np.random.seed(24)
df = pd.DataFrame()
df = pd.concat([df, pd.DataFrame(np.random.randn(10, 4)*10, columns=list('ABCD'))],
axis=1)
df.iloc[0, 2] = 0.0
cm = sns.diverging_palette(5, 250, as_cmap=True)
def background_gradient(s, m, M, cmap='PuBu', low=0, high=0):
rng = M - m
norm = colors.Normalize(m - (rng * low),
M + (rng * high))
normed = norm(s.values)
c = [colors.rgb2hex(x) for x in plt.cm.get_cmap(cmap)(normed)]
return ['background-color: %s' % color for color in c]
even_range = np.max([np.abs(df.values.min()), np.abs(df.values.max())])
df.style.apply(background_gradient,
cmap=cm,
m=-even_range,
M=even_range).set_precision(2)

Pandas histogram df.hist() group by

How to plot a histogram with pandas DataFrame.hist() using group by?
I have a data frame with 5 columns: "A", "B", "C", "D" and "Group"
There are two Groups classes: "yes" and "no"
Using:
df.hist()
I get the hist for each of the 4 columns.
Now I would like to get the same 4 graphs but with blue bars (group="yes") and red bars (group = "no").
I tried this withouth success:
df.hist(by = "group")
Using Seaborn
If you are open to use Seaborn, a plot with multiple subplots and multiple variables within each subplot can easily be made using seaborn.FacetGrid.
import numpy as np; np.random.seed(1)
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.randn(300,4), columns=list("ABCD"))
df["group"] = np.random.choice(["yes", "no"], p=[0.32,0.68],size=300)
df2 = pd.melt(df, id_vars='group', value_vars=list("ABCD"), value_name='value')
bins=np.linspace(df2.value.min(), df2.value.max(), 10)
g = sns.FacetGrid(df2, col="variable", hue="group", palette="Set1", col_wrap=2)
g.map(plt.hist, 'value', bins=bins, ec="k")
g.axes[-1].legend()
plt.show()
This is not the most flexible workaround but will work for your question specifically.
def sephist(col):
yes = df[df['group'] == 'yes'][col]
no = df[df['group'] == 'no'][col]
return yes, no
for num, alpha in enumerate('abcd'):
plt.subplot(2, 2, num)
plt.hist(sephist(alpha)[0], bins=25, alpha=0.5, label='yes', color='b')
plt.hist(sephist(alpha)[1], bins=25, alpha=0.5, label='no', color='r')
plt.legend(loc='upper right')
plt.title(alpha)
plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=1.0)
You could make this more generic by:
adding a df and by parameter to sephist: def sephist(df, by, col)
making the subplots loop more flexible: for num, alpha in enumerate(df.columns)
Because the first argument to matplotlib.pyplot.hist can take
either a single array or a sequency of arrays which are not required
to be of the same length
...an alternattive would be:
for num, alpha in enumerate('abcd'):
plt.subplot(2, 2, num)
plt.hist((sephist(alpha)[0], sephist(alpha)[1]), bins=25, alpha=0.5, label=['yes', 'no'], color=['r', 'b'])
plt.legend(loc='upper right')
plt.title(alpha)
plt.tight_layout(pad=0.4, w_pad=0.5, h_pad=1.0)
I generalized one of the other comment's solutions. Hope it helps someone out there. I added a line to ensure binning (number and range) is preserved for each column, regardless of group. The code should work for both "binary" and "categorical" groupings, i.e. "by" can specify a column wherein there are N number of unique groups. Plotting also stops if the number of columns to plot exceeds the subplot space.
import numpy as np
import matplotlib.pyplot as plt
def composite_histplot(df, columns, by, nbins=25, alpha=0.5):
def _sephist(df, col, by):
unique_vals = df[by].unique()
df_by = dict()
for uv in unique_vals:
df_by[uv] = df[df[by] == uv][col]
return df_by
subplt_c = 4
subplt_r = 5
fig = plt.figure()
for num, col in enumerate(columns):
if num + 1 > subplt_c * subplt_r:
continue
plt.subplot(subplt_c, subplt_r, num+1)
bins = np.linspace(df[col].min(), df[col].max(), nbins)
for lbl, sepcol in _sephist(df, col, by).items():
plt.hist(sepcol, bins=bins, alpha=alpha, label=lbl)
plt.legend(loc='upper right', title=by)
plt.title(col)
plt.tight_layout()
return fig
TLDR oneliner;
It won't create the subplots but will create 4 different plots;
[df.groupby('group')[i].plot(kind='hist',title=i)[0] and plt.legend() and plt.show() for i in 'ABCD']
Full working example below
import numpy as np; np.random.seed(1)
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.randn(300,4), columns=list("ABCD"))
df["group"] = np.random.choice(["yes", "no"], p=[0.32,0.68],size=300)
[df.groupby('group')[i].plot(kind='hist',title=i)[0] and plt.legend() and plt.show() for i in 'ABCD']

Using plot_date change node icon type

When using plot_date, how do you change some of the nodes in the set from a circle to an X?
For example all nodes are circles except the 3, 8, and 19 node, which are all Xs.
I have used a sample dataset, since you didnt provided any.
import pandas as pd
import matplotlib.pyplot as plt
data = {'2014-11-15':1, '2014-11-16':2, '2014-11-17':3, '2014-11-18':5, '2014-11-19':8, '2014-11-20': 19}
df = pd.DataFrame(list(data.iteritems()), columns=['Date', 'val'])
df = df.set_index(pd.to_datetime(df.Date, format='%Y-%m-%d'))
o_list = []
x_list = []
check_list = [3,8,19]
for index in df.index:
if df.val[index] in check_list:
o_list.append(index)
else:
x_list.append(index)
df_o = df.ix[o_list]
df_x = df.ix[x_list]
fig = plt.figure()
plt.plot_date(df_o.index, df_o.val, 'bo')
plt.plot_date(df_x.index, df_x.val, 'bx')
plt.show()