Plot multiple lines in a line graph using matplotlib - pandas

I am trying to plot a line graph with several lines in it, one for each group.
X axis would be the hour and y axis would be the count.
Since there are 3 groups in the dataframe, i will have 3 lines in a single line graph.
This is the code I have used but not sure where I am going wrong.
Group Hour Count
G1 1 40
G2 1 300
G1 2 400
G2 2 80
G3 2 1211
Code used:
fig, ax = plt.subplots()
labels = []
for key, grp in df1.groupby(['Group']):
ax = grp.plot(ax=ax, kind='line', x='x', y='y', c=key)
labels.append(key)
lines, _ = ax.get_legend_handles_labels()
ax.legend(lines, labels, loc='best')
plt.show()

You can use df.pivot and save yourself some lines
df.pivot('Hour', 'Group', 'Count').plot(kind='line', marker='o')
G3 is plotted as a point because there is only one point (2 hrs, 1211 count) associated with it.

Related

Overlaying boxplots on the relative bin of a histogram

Taking the dataset 'tip' as an example
total_bill
tip
smoker
day
time
size
16.99
1.01
No
Sun
Dinner
2
10.34
1.66
No
Sun
Dinner
3
21.01
3.50
No
Sun
Dinner
3
23.68
3.31
No
Sun
Dinner
2
24.59
3.61
No
Sun
Dinner
4
what I'm trying to do is represent the distribution of the variable 'total_bill' and relate each of its bins to the distribution of the variable 'tip' linked to it. In this example, this graph is meant to answer the question: "What is the distribution of tips left by customers as a function of the bill they paid?"
I have more or less achieved the graph I wanted to obtain (but there is a problem. At the end I explain what it is).
And the procedure I adopted is this:
Dividing 'total_bill' into bins.
tips['bins_total_bill'] = pd.cut(tips.total_bill, 10)
tips.head()
total_bill
tip
smoker
day
time
size
bins_total_bill
16.99
1.01
No
Sun
Dinner
2
(12.618, 17.392]
10.34
1.66
No
Sun
Dinner
3
(7.844, 12.618]
21.01
3.50
No
Sun
Dinner
3
(17.392, 22.166]
23.68
3.31
No
Sun
Dinner
2
(22.166, 26.94]
24.59
3.61
No
Sun
Dinner
4
(22.166, 26.94]
Creation of a pd.Series with:
Index: pd.interval of total_cost bins
Values: n° of occurrences
s = tips['bins_total_bill'].value_counts(sort=False)
s
(3.022, 7.844] 7
(7.844, 12.618] 42
(12.618, 17.392] 68
(17.392, 22.166] 51
(22.166, 26.94] 31
(26.94, 31.714] 19
(31.714, 36.488] 12
(36.488, 41.262] 7
(41.262, 46.036] 3
(46.036, 50.81] 4
Name: bins_total_bill, dtype: int64
Combine barplot and poxplot together
fig, ax1 = plt.subplots(dpi=200)
ax2 = ax1.twinx()
sns.barplot(ax=ax1, x = s.index, y = s.values)
sns.boxplot(ax=ax2, x='bins_total_bill', y='tip', data=tips)
sns.stripplot(ax=ax2, x='bins_total_bill', y='tip', data=tips, size=5, color="yellow", edgecolor='red', linewidth=0.3)
#Title and axis labels
ax1.tick_params(axis='x', rotation=90)
ax1.set_ylabel('Number of bills')
ax2.set_ylabel('Tips [$]')
ax1.set_xlabel("Mid value of total_bill bins [$]")
ax1.set_title("Tips ~ Total_bill distribution")
#Reference lines average(tip) + add yticks + Legend
avg_tip = np.mean(tips.tip)
ax2.axhline(y=avg_tip, color='red', linestyle="--", label="avg tip")
ax2.set_yticks(list(ax2.get_yticks() + avg_tip))
ax2.legend(loc='best')
#Set labels axis x
ax1.set_xticklabels(list(map(lambda s: round(s.mid,2), s.index)))
It has to be said that this graph has a problem! As the x-axis is categorical, I cannot, for example, add a vertical line at the mean value of 'total_bill'.
How can I fix this to get the correct result?
I also wonder if there is a correct and more streamlined approach than the one I have adopted.
I thought of this method, which is more compact than the previous one (it can probably be done better) and overcomes the problem of scaling on the x-axis.
I split 'total_bill' into bins and add the column to Df
tips['bins_total_bill'] = pd.cut(tips.total_bill, 10)
Group column 'tip' by previously created bins
obj_gby_tips = tips.groupby('bins_total_bill')['tip']
gby_tip = dict(list(obj_gby_tips))
Create dictionary with:
keys: midpoint of each bins interval
values: gby tips for each interval
mid_total_bill_bins = list(map(lambda bins: bins.mid, list(gby_tip.keys())))
gby_tips = gby_tip.values()
tip_gby_total_bill_bins = dict(zip(mid_total_bill_bins, gby_tips))
Create chart by passing to each rectangle of the boxplot the
centroid of each respective bins
fig, ax1 = plt.subplots(dpi=200)
ax2 = ax1.twinx()
bp_values = list(tip_gby_total_bill_bins.values())
bp_pos = list(tip_gby_total_bill_bins.keys())
l1 = sns.histplot(tips.total_bill, bins=10, ax=ax1)
l2 = ax2.boxplot(bp_values, positions=bp_pos, manage_ticks=False, patch_artist=True, widths=2)
#Average tips as hline
avg_tip = np.mean(tips.tip)
ax2.axhline(y=avg_tip, color='red', linestyle="--", label="avg tip")
ax2.set_yticks(list(ax2.get_yticks() + avg_tip)) #add value of avg(tip) to y-axis
#Average total_bill as vline
avg_total_bill=np.mean(tips.total_bill)
ax1.axvline(x=avg_total_bill, color='orange', linestyle="--", label="avg tot_bill")
then the result.

Joint plot for groupby datas on seaborn

I have a dataframe that looks like this:
In[1]: df.head()
Out[1]:
dataset x y
1 56 45
1 31 67
7 22 85
2 90 45
2 15 42
There are about 4000 more rows. x and y is grouped by the datasets. I am trying to plot a jointplot for each dataset separately using seaborn. This is what I can come up so far:
import seaborn as sns
g = sns.FacetGrid(df, col="dataset", col_wrap=3)
g.map_dataframe(sns.scatterplot, x="x", y="y", color = "#7db4a2")
g.map_dataframe(sns.histplot, x="x", color = "#7db4a2")
g.map_dataframe(sns.histplot, y="y", color = "#7db4a2")
g.add_legend();
but there are all overlapped. How do I make a proper jointplot for each dataset in a subplot? Thank you in advanced and cheers!
You can use groupby on your dataset column, then use sns.jointgrid(), and then finally add your scatter plot and KDE plot to the jointgrid.
Here is an example using a random seed generator with numpy. I made three "datasets" and random x,y values. See the Seaborn jointgrid documentation for ways to customize colors, etc.
### Build an example dataset
np.random.seed(seed=1)
ds = (np.arange(3)).tolist()*10
x = np.random.randint(100, size=(60)).tolist()
y = np.random.randint(20, size=(60)).tolist()
df = pd.DataFrame(data=zip(ds, x, y), columns=["ds", "x", "y"])
### The plots
for _ds, group in df.groupby('ds'):
group = group.copy()
g = sns.JointGrid(data=group, x='x', y='y')
g.plot(sns.scatterplot, sns.kdeplot)

Subplot multiindex data by level

This is my multiindex data.
Month Hour Hi
1 9 84.39
10 380.41
11 539.06
12 588.70
13 570.62
14 507.42
15 340.42
16 88.91
2 8 69.31
9 285.13
10 474.95
11 564.42
12 600.11
13 614.36
14 539.79
15 443.93
16 251.57
17 70.51
I want to make subplot where each subplot represent the Month. x axis is hour, y axis is Hi of the respective month.
This gives a beautiful approach as follow:
levels = df.index.levels[0]
fig, axes = plt.subplots(len(levels), figsize=(3, 25))
for level, ax in zip(levels, axes):
df.loc[level].plot(ax=ax, title=str(level))
plt.tight_layout()
I want to make 1x2 subplot instead of vertically arranged as above. Later, with larger data, I want to make 3x4 subplot or even larger dimension.
How to do it?
You can do it in pandas
df.Hi.unstack(0).fillna(0).plot(kind='line',subplots=True, layout=(1,2))
Pass the rows and columns arguments to plt.subplots
levels = df.index.levels[0]
# Number of rows v
fig, axes = plt.subplots(1, len(levels), figsize=(6, 3))
for level, ax in zip(levels, axes):
df.loc[level].plot(ax=ax, title=str(level))
plt.tight_layout()

Customizing subplots in matplotlib

I want to place 3 plots using subplots. Two plots on the top row and one plot that will occupy the entire second row.
My code creates a gap between the top two plots and the lower plot. How can I correct that?
df_CI
Country China India
1980 5123 8880
1981 6682 8670
1982 3308 8147
1983 1863 7338
1984 1527 5704
fig = plt.figure() # create figure
ax0 = fig.add_subplot(221) # add subplot 1 (2 row, 2 columns, first plot)
ax1 = fig.add_subplot(222) # add subplot 2 (2 row, 2 columns, second plot).
ax2 = fig.add_subplot(313) # a 3 digit number where the hundreds represent nrows, the tens represent ncols
# and the units represent plot_number.
# Subplot 1: Box plot
df_CI.plot(kind='box', color='blue', vert=False, figsize=(20, 20), ax=ax0) # add to subplot 1
ax0.set_title('Box Plots of Immigrants from China and India (1980 - 2013)')
ax0.set_xlabel('Number of Immigrants')
ax0.set_ylabel('Countries')
# Subplot 2: Line plot
df_CI.plot(kind='line', figsize=(20, 20), ax=ax1) # add to subplot 2
ax1.set_title ('Line Plots of Immigrants from China and India (1980 - 2013)')
ax1.set_ylabel('Number of Immigrants')
ax1.set_xlabel('Years')
# Subplot 3: Box plot
df_CI.plot(kind='bar', figsize=(20, 20), ax=ax2) # add to subplot 1
ax0.set_title('Box Plots of Immigrants from China and India (1980 - 2013)')
ax0.set_xlabel('Number of Immigrants')
ax0.set_ylabel('Countries')
plt.show()
I've always found subplots syntax a little difficult.
With these calls
ax0 = fig.add_subplot(221)
ax1 = fig.add_subplot(222)
you're dividing your figure in a 2x2 grid and filling the first row.
ax2 = fig.add_subplot(313)
Now you're dividing it in three rows and filling the last one.
You're basically creating two independent subplot grids, there is no easy way to define how to space subplots from one with respect to the other.
A much easier and pythonic way is using gridspec to create a single finer grid and address it with python slicing.
fig = plt.figure()
gs = mpl.gridspec.GridSpec(2, 2, wspace=0.25, hspace=0.25) # 2x2 grid
ax0 = fig.add_subplot(gs[0, 0]) # first row, first col
ax1 = fig.add_subplot(gs[0, 1]) # first row, second col
ax2 = fig.add_subplot(gs[1, :]) # full second row
And now you can also easily tune spacing with wspace and hspace.
More complex layouts are also a lot easier, it's just the familiar slicing syntax.
fig = plt.figure()
gs = mpl.gridspec.GridSpec(10, 10, wspace=0.25, hspace=0.25)
fig.add_subplot(gs[2:8, 2:8])
fig.add_subplot(gs[0, :])
for i in range(5):
fig.add_subplot(gs[1, (i*2):(i*2+2)])
fig.add_subplot(gs[2:, :2])
fig.add_subplot(gs[8:, 2:4])
fig.add_subplot(gs[8:, 4:9])
fig.add_subplot(gs[2:8, 8])
fig.add_subplot(gs[2:, 9])
fig.add_subplot(gs[3:6, 3:6])
# fancy colors
cmap = mpl.cm.get_cmap("viridis")
naxes = len(fig.axes)
for i, ax in enumerate(fig.axes):
ax.set_xticks([])
ax.set_yticks([])
ax.set_facecolor(cmap(float(i)/(naxes-1)))

plot ordering/layering julia pyplot

I have a subplot that plots a line (x,y) and a particular point (xx,yy). I want to highligh (xx,yy), so I've plotted it with scatter. However, even if I order it after the original plot, the new point still shows up behind the original line. How can I fix this? MWE below.
x = 1:10
y = 1:10
xx = 5
yy = 5
fig, ax = subplots()
ax[:plot](x,y)
ax[:scatter](xx,yy, color="red", label="h_star", s=100)
legend()
xlabel("x")
ylabel("y")
title("test")
grid("on")
You can change which plots are displayed on top of each other with the argument zorder. The matplotlib example shown here gives a brief explanation:
The default drawing order for axes is patches, lines, text. This
order is determined by the zorder attribute. The following defaults
are set
Artist Z-order
Patch / PatchCollection 1
Line2D / LineCollection 2
Text 3
You can change the order for individual artists by setting the zorder.
Any individual plot() call can set a value for the zorder of that
particular item.
A full example based on the code in the question, using python is shown below:
import matplotlib.pyplot as plt
x = range(1,10)
y = range(1,10)
xx = 5
yy = 5
fig, ax = plt.subplots()
ax.plot(x,y)
# could set zorder very high, say 10, to "make sure" it will be on the top
ax.scatter(xx,yy, color="red", label="h_star", s=100, zorder=3)
plt.legend()
plt.xlabel("x")
plt.ylabel("y")
plt.title("test")
plt.grid("on")
plt.show()
Which gives: