how to combine two bar chart of two files in one diagram in matplotlib pandas - pandas

I have two dataframe with the same columns but different content.
I have plotted dffinal data frame. now I want to plot another dataframe dffinal_no on the same diagram to be comparable.
for example one bar chart in blue colour, and the same bar chart with another colour just differentiating in y-axis.
This is part of the code in which I have plotted the first data frame.
dffinal = df[['6month','final-formula','numPatients6month']].drop_duplicates().sort_values(['6month'])
ax=dffinal.plot(kind='bar',x='6month', y='final-formula')
import matplotlib.pyplot as plt
ax2 = ax.twinx()
dffinal.plot(ax=ax2,x='6month', y='numPatients6month')
plt.show()
Now imagine I have another dffinal_no data frame with the same columns, how can I plot it in the same diagram?
This is my first diagram which I plotted, I want the other bar chart on this diagram with another color.
so the answer of #Mohamed Thasin ah is somehow what I want, except that the right y-axis is not correct.
I want both data frame be based on (6month, final-formula) but the right y-axis is just showing number of patients, as an information for the user.
Actually, I DO NOT want the first df based on final-fomula and the second df be based on NumberPatients.
Update1 jast as a refrence how it looks like my data frame
dffinal = df[['6month','final-formula','numPatients6month']].drop_duplicates().sort_values(['6month'])
nocidffinal = nocidf[['6month','final-formula','numPatients6month']].drop_duplicates().sort_values(['6month'])
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax2 = ax1.twinx()
ax1.set_ylabel('final-formula')
ax2.set_ylabel('numPatients6month')
width=0.4
nocidffinal=nocidffinal.set_index('6month').sort_index()
dffinal=dffinal.set_index('6month').sort_index()
nocidffinal['final-formula'].plot(kind='bar',color='green',ax=ax1,width=width,position=0)
dffinal['numPatients6month'].plot(kind='bar',color='red',ax=ax2,width=width,position=1)
dffinal content
,6month,final-formula,numPatients6month
166047.0,1,7.794117647058823,680
82972.0,2,5.720823798627003,437
107227.0,3,5.734767025089606,558
111330.0,4,4.838709677419355,434
95591.0,5,3.3707865168539324,534
95809.0,6,3.611738148984198,443
98662.0,7,3.5523978685612785,563
192668.0,8,2.9978586723768736,467
89460.0,9,0.9708737864077669,515
192585.0,10,2.1653543307086616,508
184325.0,11,1.727447216890595,521
85068.0,12,1.0438413361169103,479
nocidffinal
,6month,final-formula,numPatients6month
137797.0,1,3.5934291581108826,974
267492.0,2,2.1705426356589146,645
269542.0,3,2.2106631989596877,769
271950.0,4,2.0,650
276638.0,5,1.5587529976019185,834
187719.0,6,1.9461077844311379,668
218512.0,7,1.1406844106463878,789
199830.0,8,0.8862629246676514,677
269469.0,9,0.3807106598984772,788
293390.0,10,0.9668508287292817,724
254783.0,11,1.2195121951219512,738
300974.0,12,0.9695290858725761,722

to compare two data frame result with bar plot one way you could try is concatenating two data frames and adding hue.
For example consider below df it contains same x and y columns in both df's and wanna compare this values. to achieve this simply add hue column for each df with differentiating constant like below.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df1=pd.DataFrame({'x':[1,2,3,4,5],'y':[10,2,454,121,34]})
df2=pd.DataFrame({'x':[4,1,2,5,3],'y':[54,12,65,12,8]})
df1['hue']=1
df2['hue']=2
res=pd.concat([df1,df2])
sns.barplot(x='x',y='y',data=res,hue='hue')
plt.show()
The result should looks like below:
To get two y-axis try this method,
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax2 = ax1.twinx()
ax1.set_ylabel('final-formula')
ax2.set_ylabel('numPatients6month')
width=0.4
df1=df1.set_index('x').sort_index()
df2=df2.set_index('x').sort_index()
df1['y'].plot(kind='bar',color='blue',ax=ax1,width=width,position=1)
df2['y'].plot(kind='bar',color='green',ax=ax2,width=width,position=0)
plt.show()
with actual input:
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax2 = ax1.twinx()
ax1.set_ylabel('final-formula')
ax2.set_ylabel('numPatients6month')
width=0.4
df1=df1.set_index('6month').sort_index()
df2=df2.set_index('6month').sort_index()
df1['final-formula'].plot(kind='bar',color='blue',ax=ax1,width=width,position=1)
df2['numPatients6month'].plot(kind='bar',color='green',ax=ax2,width=width,position=0)
plt.show()

Related

How to mirror the bars

I have two bars which I want to mirror. I have the following code
bar1 = df['nt'].value_counts().plot.barh()
bar2 = df1['nt'].value_counts().plot.barh()
bar1.set_xlim(bar1.get_xlim()[::-1])
# bar1.yaxis.tick_right()
But somehow not only the bar1 flips to the left(third line), but also the bar2. The same happening with the commented 4th line. Why is that? How to do it right then?
df...plot.barh()doesn't return bars nor a barplot. It returns theaxwhich indicates the subplot where the barplot was added. As both barplots are created onto the same subplot,set_xlim` etc. will act on that same subplot. This blogpost might be helpful.
To get two barplots, one from the left and one from the right, you could create a "twin" y -axis and then drawing one bar plot using the lower x-axis and the other user the upper x-axis. To make things clearer, the tick labels can be colored the same as the bars. To avoid overlapping bars, the x limits should be at least the maximum of the sum of the two value_counts.
import numpy as np
import pandas as pd
from matplotlib import pyplot as plt
df = pd.DataFrame({'nt': np.random.choice([*'abcdefhij'], 50)})
df1 = pd.DataFrame({'nt': np.random.choice([*'abcdefhij'], 50)})
max_sum_value_counts = df.append(df1).value_counts().max()
fig, ax = plt.subplots(figsize=(12, 5))
df['nt'].value_counts(sort=False).sort_index().plot.barh(ax=ax, color='purple')
ax.set_xlim(0, max_sum_value_counts + 1)
ax.tick_params(labelcolor='purple')
ax1 = ax.twiny()
df1['nt'].value_counts(sort=False).sort_index().plot.barh(ax=ax1, color='crimson')
ax1.set_xlim(max_sum_value_counts + 1, 0)
ax1.tick_params(labelcolor='crimson', labelright=True, labelleft=False)
ax1.invert_yaxis()
plt.show()

Making a Scatter Plot from a DataFrame in Pandas

I have a DataFrame and need to make a scatter-plot from it.
I need to use 2 columns as the x-axis and y-axis and only need to plot 2 rows from the entire dataset. Any suggestions?
For example, my dataframe is below (50 states x 4 columns). I need to plot 'rgdp_change' on the x-axis vs 'diff_unemp' on the y-axis, and only need to plot for the states, "Michigan" and "Wisconsin".
So from the dataframe, you'll need to select the rows from a list of the states you want: ['Michigan', 'Wisconsin']
I also figured you would probably want a legend or some way to differentiate one point from the other. To do this, we create a colormap assigning a different color to each state. This way the code is generalizable for more than those two states.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
# generate a random df with the relevant rows, columns to your actual df
df = pd.DataFrame({'State':['Alabama', 'Alaska', 'Michigan', 'Wisconsin'], 'real_gdp':[1.75*10**5, 4.81*10**4, 2.59*10**5, 1.04*10**5],
'rgdp_change': [-0.4, 0.5, 0.4, -0.5], 'diff_unemp': [-1.3, 0.4, 0.5, -11]})
fig, ax = plt.subplots()
states = ['Michigan', 'Wisconsin']
colormap = cm.viridis
colorlist = [colors.rgb2hex(colormap(i)) for i in np.linspace(0, 0.9, len(states))]
for i,c in enumerate(colorlist):
x = df.loc[df["State"].isin(['Michigan', 'Wisconsin'])].rgdp_change.values[i]
y = df.loc[df["State"].isin(['Michigan', 'Wisconsin'])].diff_unemp.values[i]
legend_label = states[i]
ax.scatter(x, y, label=legend_label, s=50, linewidth=0.1, c=c)
ax.legend()
plt.show()
Use the dataframe plot method, but first filter the sates you need using index isin method:
states = ["Michigan", "Wisconsin"]
df[df.index.isin(states)].plot(kind='scatter', x='rgdp_change', y='diff_unemp')

Line plot of two different grouped by dataframes

I have grouped data in 2 separate dataframes and want to plot them together with 2 separate lines in one plot.
I have grouped the data as I needed and plotted separate graphs based on the grouped data.
grouped_men = df_men.groupby('age').mean()[['oldpeak']]
grouped_women = df_women.groupby('age').mean()[['oldpeak']]
grouped_men.plot(kind='line',title='Mens age vs oldpeak')
grouped_women.plot(kind='line',title='Womens age vs oldpeak')
But now instead of 2 separate plots, i need to plot one single graph with 2 lines of both men and women.
Current plot look like this:
You need to specify axes where Pandas should put the plots. Try the following:
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
grouped_men.plot(kind='line', ax=ax, label='Mens age vs oldpeak')
grouped_women.plot(kind='line', ax=ax, label='Womens age vs oldpeak')
plt.gca().legend(title="Legend title") # Changes
plt.show()

How do I connect two sets of XY scatter values in MatPlotLib?

I am using MatLibPlot to fetch data from an excel file and to create a scatter plot.
Here is a minimal sample table
In my scatter plot, I have two sets of XY values. In both sets, my X values are country population. I have Renewable Energy Consumed as my Y value in one set and Non-Renewable Energy Consumed in the other set.
For each Country, I would like to have a line from the renewable point to the non-renewable point.
My example code is as follows
import pandas as pd
import matplotlib.pyplot as plt
excel_file = 'example_graphs.xlsx'
datasheet = pd.read_excel(excel_file, sheet_name=0, index_col=0)
ax = datasheet.plot.scatter("Xcol","Y1col",c="b",label="set_one")
datasheet.scatter("Xcol","Y2col",c="r",label="set_two", ax=ax)
ax.show()
And it produces the following plot
I would love to be able to draw a line between the two sets of points, preferably a line I can change the thickness and color of.
As commented, you could simply loop over the dataframe and plot a line for each row.
import pandas as pd
import matplotlib.pyplot as plt
datasheet = pd.DataFrame({"Xcol" : [1,2,3],
"Y1col" : [25,50,75],
"Y2col" : [75,50,25]})
ax = datasheet.plot.scatter("Xcol","Y1col",c="b",label="set_one")
datasheet.plot.scatter("Xcol","Y2col",c="r",label="set_two", ax=ax)
for n,row in datasheet.iterrows():
ax.plot([row["Xcol"]]*2,row[["Y1col", "Y2col"]], color="limegreen", lw=3, zorder=0)
plt.show()

how to plot 2 histograms side by side?

I have 2 dataframes. I want to plot a histogram based on a column 'rate' for each, side by side. How to do it?
I tried this:
import matplotlib.pyplot as plt
plt.subplot(1,2,1)
dflux.hist('rate' , bins=100)
plt.subplot(1,2,2)
dflux2.hist('rate' , bins=100)
plt.tight_layout()
plt.show()
It did not have the desired effect. It showed two blank charts then one populated chart.
Use subplots to define a figure with two axes. Then specify the axis to plot to within hist using the ax parameter.
fig, axes = plt.subplots(1, 2)
dflux.hist('rate', bins=100, ax=axes[0])
dflux2.hist('rate', bins=100, ax=axes[1])
Demo
dflux = pd.DataFrame(dict(rate=np.random.randn(10000)))
dflux2 = pd.DataFrame(dict(rate=np.random.randn(10000)))
fig, axes = plt.subplots(1, 2)
dflux.hist('rate', bins=100, ax=axes[0])
dflux2.hist('rate', bins=100, ax=axes[1])