How do I connect two sets of XY scatter values in MatPlotLib? - pandas

I am using MatLibPlot to fetch data from an excel file and to create a scatter plot.
Here is a minimal sample table
In my scatter plot, I have two sets of XY values. In both sets, my X values are country population. I have Renewable Energy Consumed as my Y value in one set and Non-Renewable Energy Consumed in the other set.
For each Country, I would like to have a line from the renewable point to the non-renewable point.
My example code is as follows
import pandas as pd
import matplotlib.pyplot as plt
excel_file = 'example_graphs.xlsx'
datasheet = pd.read_excel(excel_file, sheet_name=0, index_col=0)
ax = datasheet.plot.scatter("Xcol","Y1col",c="b",label="set_one")
datasheet.scatter("Xcol","Y2col",c="r",label="set_two", ax=ax)
ax.show()
And it produces the following plot
I would love to be able to draw a line between the two sets of points, preferably a line I can change the thickness and color of.

As commented, you could simply loop over the dataframe and plot a line for each row.
import pandas as pd
import matplotlib.pyplot as plt
datasheet = pd.DataFrame({"Xcol" : [1,2,3],
"Y1col" : [25,50,75],
"Y2col" : [75,50,25]})
ax = datasheet.plot.scatter("Xcol","Y1col",c="b",label="set_one")
datasheet.plot.scatter("Xcol","Y2col",c="r",label="set_two", ax=ax)
for n,row in datasheet.iterrows():
ax.plot([row["Xcol"]]*2,row[["Y1col", "Y2col"]], color="limegreen", lw=3, zorder=0)
plt.show()

Related

How do I plot multiple line in 1 graph from multiple data in dataframe

I made some data in dataframe and I wanted to plot them into 1 graph, how do I do that?
I plot them one by one using these codes and here are the results
df.plot(x='MONTH', y='MONTHLY INCOME')
df.plot(x='MONTH', y='MONTHLY EXPENSES')
df.plot(x='MONTH', y='MONTHLY SAVINGS')
graphs
https://i.stack.imgur.com/59kpV.png
Dataframe
https://i.stack.imgur.com/vPiFO.png
Try using matplotlib.
import matplotlib.pyplot as plt
legend_labels = ['INCOME', 'EXPENSES', 'SAVINGS']
plt.plot(df['MONTH'], df['MONTHLY INCOME'])
plt.plot(df['MONTH'], df['MONTHLY EXPENSES'])
plt.plot(df['MONTH'], df['MONTHLY SAVINGS'])
plt.legend(legend_labels)
plt.show()

Pandas histogram plot with Y axis or colorbar

In Pandas, I am trying to generate a Ridgeline plot for which the density values are shown (either as Y axis or color-ramp). I am using the Joyplot but any other alternative ways are fine.
So, first I created the Ridge plot to show the different distribution plot for each condition (you can reproduce it using this code):
import pandas as pd
import joypy
import matplotlib
import matplotlib.pyplot as plt
df1 = pd.DataFrame({'Category1':np.random.choice(['C1','C2','C3'],1000),'Category2':np.random.choice(['B1','B2','B3','B4','B5'],1000),
'year':np.arange(start=1900, stop=2900, step=1),
'Data':np.random.uniform(0,1,1000),"Period":np.random.choice(['AA','CC','BB','DD'],1000)})
data_pivot=df1.pivot_table('Data', ['Category1', 'Category2','year'], 'Period')
fig, axes = joypy.joyplot(data_pivot, column=['AA', 'BB', 'CC', 'DD'], by="Category1", ylim='own', figsize=(14,10), legend=True, alpha=0.4)
so it generates the figure but without my desired Y axis. So, based on this post, I could add a colorramp, which neither makes sense nor show the differences between the distribution plot of the different categories on each line :) ...
ar=df1['Data'].plot.kde().get_lines()[0].get_ydata() ## a workaround to get the probability values to set the colorramp max and min
norm = plt.Normalize(ar.min(), ar.max())
original_cmap = plt.cm.viridis
cmap = matplotlib.colors.ListedColormap(original_cmap(norm(ar)))
sm = matplotlib.cm.ScalarMappable(cmap=original_cmap, norm=norm)
sm.set_array([])
# plotting ....
fig, axes = joypy.joyplot(data_pivot,colormap = cmap , column=['AA', 'BB', 'CC', 'DD'], by="Category1", ylim='own', figsize=(14,10), legend=True, alpha=0.4)
fig.colorbar(sm, ax=axes, label="density")
But what I want is some thing like either of these figures (preferably with colorramp) :

Making a Scatter Plot from a DataFrame in Pandas

I have a DataFrame and need to make a scatter-plot from it.
I need to use 2 columns as the x-axis and y-axis and only need to plot 2 rows from the entire dataset. Any suggestions?
For example, my dataframe is below (50 states x 4 columns). I need to plot 'rgdp_change' on the x-axis vs 'diff_unemp' on the y-axis, and only need to plot for the states, "Michigan" and "Wisconsin".
So from the dataframe, you'll need to select the rows from a list of the states you want: ['Michigan', 'Wisconsin']
I also figured you would probably want a legend or some way to differentiate one point from the other. To do this, we create a colormap assigning a different color to each state. This way the code is generalizable for more than those two states.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
# generate a random df with the relevant rows, columns to your actual df
df = pd.DataFrame({'State':['Alabama', 'Alaska', 'Michigan', 'Wisconsin'], 'real_gdp':[1.75*10**5, 4.81*10**4, 2.59*10**5, 1.04*10**5],
'rgdp_change': [-0.4, 0.5, 0.4, -0.5], 'diff_unemp': [-1.3, 0.4, 0.5, -11]})
fig, ax = plt.subplots()
states = ['Michigan', 'Wisconsin']
colormap = cm.viridis
colorlist = [colors.rgb2hex(colormap(i)) for i in np.linspace(0, 0.9, len(states))]
for i,c in enumerate(colorlist):
x = df.loc[df["State"].isin(['Michigan', 'Wisconsin'])].rgdp_change.values[i]
y = df.loc[df["State"].isin(['Michigan', 'Wisconsin'])].diff_unemp.values[i]
legend_label = states[i]
ax.scatter(x, y, label=legend_label, s=50, linewidth=0.1, c=c)
ax.legend()
plt.show()
Use the dataframe plot method, but first filter the sates you need using index isin method:
states = ["Michigan", "Wisconsin"]
df[df.index.isin(states)].plot(kind='scatter', x='rgdp_change', y='diff_unemp')

Stacked barplot in pandas- read from dataframe?

I am trying to create a stacked barplot using a data frame I have created that
looks like this
I want the stacked bar chart to show the 'types of exploitation' on the x axis, and then the male and female figures stacked on top of each other under these headings.
Is there a way to do this reading the info from my df? I have read about creating an index to do this but do not understand if this is the solution?
I also need a legend showing 'male' and 'female'
You can stack bars on top of eachother by the bottom function in matplotlib package.
Step 1: Create dataframe and import packages
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import rc
d = {'male': [37,1032,1], 'female': [96,134,1]}
df = pd.DataFrame(data=d, index=['a', 'b', 'c'])
Step 2: Create graph
r = [0,1,2]
bars1 = df['female']
bars2 = df['male']
plt.bar(r, bars1)
plt.bar(r, bars2,bottom=bars1, color='#557f2d')
plt.xticks(r, df.index, fontweight='bold')
plt.legend(labels = ['female', 'male'])
plt.show()
More information could be found on this webpage: Link

how to combine two bar chart of two files in one diagram in matplotlib pandas

I have two dataframe with the same columns but different content.
I have plotted dffinal data frame. now I want to plot another dataframe dffinal_no on the same diagram to be comparable.
for example one bar chart in blue colour, and the same bar chart with another colour just differentiating in y-axis.
This is part of the code in which I have plotted the first data frame.
dffinal = df[['6month','final-formula','numPatients6month']].drop_duplicates().sort_values(['6month'])
ax=dffinal.plot(kind='bar',x='6month', y='final-formula')
import matplotlib.pyplot as plt
ax2 = ax.twinx()
dffinal.plot(ax=ax2,x='6month', y='numPatients6month')
plt.show()
Now imagine I have another dffinal_no data frame with the same columns, how can I plot it in the same diagram?
This is my first diagram which I plotted, I want the other bar chart on this diagram with another color.
so the answer of #Mohamed Thasin ah is somehow what I want, except that the right y-axis is not correct.
I want both data frame be based on (6month, final-formula) but the right y-axis is just showing number of patients, as an information for the user.
Actually, I DO NOT want the first df based on final-fomula and the second df be based on NumberPatients.
Update1 jast as a refrence how it looks like my data frame
dffinal = df[['6month','final-formula','numPatients6month']].drop_duplicates().sort_values(['6month'])
nocidffinal = nocidf[['6month','final-formula','numPatients6month']].drop_duplicates().sort_values(['6month'])
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax2 = ax1.twinx()
ax1.set_ylabel('final-formula')
ax2.set_ylabel('numPatients6month')
width=0.4
nocidffinal=nocidffinal.set_index('6month').sort_index()
dffinal=dffinal.set_index('6month').sort_index()
nocidffinal['final-formula'].plot(kind='bar',color='green',ax=ax1,width=width,position=0)
dffinal['numPatients6month'].plot(kind='bar',color='red',ax=ax2,width=width,position=1)
dffinal content
,6month,final-formula,numPatients6month
166047.0,1,7.794117647058823,680
82972.0,2,5.720823798627003,437
107227.0,3,5.734767025089606,558
111330.0,4,4.838709677419355,434
95591.0,5,3.3707865168539324,534
95809.0,6,3.611738148984198,443
98662.0,7,3.5523978685612785,563
192668.0,8,2.9978586723768736,467
89460.0,9,0.9708737864077669,515
192585.0,10,2.1653543307086616,508
184325.0,11,1.727447216890595,521
85068.0,12,1.0438413361169103,479
nocidffinal
,6month,final-formula,numPatients6month
137797.0,1,3.5934291581108826,974
267492.0,2,2.1705426356589146,645
269542.0,3,2.2106631989596877,769
271950.0,4,2.0,650
276638.0,5,1.5587529976019185,834
187719.0,6,1.9461077844311379,668
218512.0,7,1.1406844106463878,789
199830.0,8,0.8862629246676514,677
269469.0,9,0.3807106598984772,788
293390.0,10,0.9668508287292817,724
254783.0,11,1.2195121951219512,738
300974.0,12,0.9695290858725761,722
to compare two data frame result with bar plot one way you could try is concatenating two data frames and adding hue.
For example consider below df it contains same x and y columns in both df's and wanna compare this values. to achieve this simply add hue column for each df with differentiating constant like below.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df1=pd.DataFrame({'x':[1,2,3,4,5],'y':[10,2,454,121,34]})
df2=pd.DataFrame({'x':[4,1,2,5,3],'y':[54,12,65,12,8]})
df1['hue']=1
df2['hue']=2
res=pd.concat([df1,df2])
sns.barplot(x='x',y='y',data=res,hue='hue')
plt.show()
The result should looks like below:
To get two y-axis try this method,
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax2 = ax1.twinx()
ax1.set_ylabel('final-formula')
ax2.set_ylabel('numPatients6month')
width=0.4
df1=df1.set_index('x').sort_index()
df2=df2.set_index('x').sort_index()
df1['y'].plot(kind='bar',color='blue',ax=ax1,width=width,position=1)
df2['y'].plot(kind='bar',color='green',ax=ax2,width=width,position=0)
plt.show()
with actual input:
fig = plt.figure()
ax1 = fig.add_subplot(111)
ax2 = ax1.twinx()
ax1.set_ylabel('final-formula')
ax2.set_ylabel('numPatients6month')
width=0.4
df1=df1.set_index('6month').sort_index()
df2=df2.set_index('6month').sort_index()
df1['final-formula'].plot(kind='bar',color='blue',ax=ax1,width=width,position=1)
df2['numPatients6month'].plot(kind='bar',color='green',ax=ax2,width=width,position=0)
plt.show()