plotting a pandas dataframe row by row - pandas

I have the following dataframe:
I want to create pie charts one for each row, the thing is that i am having trouble with the charts order, i want each chart to have a figsize of lets say 5,5 and that every row in my dataframe will be a row of plot in my subplots with the index as title.
tried many combinations and playing with pyploy.subplots but not success.
would be glad for some help.
Thanks

You can either transpose your dataframe and using pandas pie kind for plotting, i.e. df.transpose().plot(kind='pie', subplots=True) or iterate through rows while sub plotting.
An example using subplots:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Recreate a similar dataframe
rows = ['rows {}'.format(i) for i in range(5)]
columns = ['hits', 'misses']
col1 = np.random.random(5)
col2 = 1 - col1
data = zip(col1, col2)
df = pd.DataFrame(data=data, index=rows, columns=columns)
# Plotting
fig = plt.figure(figsize=(15,10))
for i, (name, row) in enumerate(df.iterrows()):
ax = plt.subplot(2,3, i+1)
ax.set_title(row.name)
ax.set_aspect('equal')
ax.pie(row, labels=row.index)
plt.show()

Related

plotting dataframe values in loop

I have a csv file named as data.csv and it contain [10000 rows x 1 columns] and i read the data as
a=read_csv("data.csv")
print(a)
dt
0 3.5257,3.5257,3.5257,3.5257,3.5257,3.5257,3.52...
1 3.3414,3.3414,3.3414,3.3414,3.3414,3.3414,3.34...
2 3.5722,3.5722,3.5722,3.5722,3.5722,3.5722,3.57...
.....
Now I want to plot each indexed rows in loop and want to save each plot in the name of index values . For example
i need to produce 0.jpg by using o indexed values ----> 3.5257,3.5257,3.5257,3.5257,3.5257,3.5257,3.52...
similarly 1.jpg by using 1 index values----> 3.3414,3.3414,3.3414,3.3414,3.3414,3.3414,3.34...
My code
a=read_csv("data.csv")
print(a)
for data in a:
plt.plot(data)
Hope experts will help me providing the answer.Thanks.
Use savefig to create image file:
import matplotlib.pyplot as plt
for name, data in a['dt'].str.split(',').items():
plt.plot(data)
plt.savefig(f'{name}.jpg')
plt.close()
0.jpg
1.jpg
With the following file.csv example:
3.5257,3.5257,3.5257,3.5257,3.5257,3.5257,3.52
3.3414,3.3414,3.3414,3.3414,3.3414,3.3414,3.34
3.5722,3.5722,3.5722,3.5722,3.5722,3.5722,3.57
Here is one way to do it with Pandas iterrows and Python f-strings:
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv("../file.csv", header=None)
for i, row in df.iterrows():
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(8, 4))
ax.plot(row)
fig.savefig(f"./{i}.png")

Making a Scatter Plot from a DataFrame in Pandas

I have a DataFrame and need to make a scatter-plot from it.
I need to use 2 columns as the x-axis and y-axis and only need to plot 2 rows from the entire dataset. Any suggestions?
For example, my dataframe is below (50 states x 4 columns). I need to plot 'rgdp_change' on the x-axis vs 'diff_unemp' on the y-axis, and only need to plot for the states, "Michigan" and "Wisconsin".
So from the dataframe, you'll need to select the rows from a list of the states you want: ['Michigan', 'Wisconsin']
I also figured you would probably want a legend or some way to differentiate one point from the other. To do this, we create a colormap assigning a different color to each state. This way the code is generalizable for more than those two states.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.cm as cm
import matplotlib.colors as colors
# generate a random df with the relevant rows, columns to your actual df
df = pd.DataFrame({'State':['Alabama', 'Alaska', 'Michigan', 'Wisconsin'], 'real_gdp':[1.75*10**5, 4.81*10**4, 2.59*10**5, 1.04*10**5],
'rgdp_change': [-0.4, 0.5, 0.4, -0.5], 'diff_unemp': [-1.3, 0.4, 0.5, -11]})
fig, ax = plt.subplots()
states = ['Michigan', 'Wisconsin']
colormap = cm.viridis
colorlist = [colors.rgb2hex(colormap(i)) for i in np.linspace(0, 0.9, len(states))]
for i,c in enumerate(colorlist):
x = df.loc[df["State"].isin(['Michigan', 'Wisconsin'])].rgdp_change.values[i]
y = df.loc[df["State"].isin(['Michigan', 'Wisconsin'])].diff_unemp.values[i]
legend_label = states[i]
ax.scatter(x, y, label=legend_label, s=50, linewidth=0.1, c=c)
ax.legend()
plt.show()
Use the dataframe plot method, but first filter the sates you need using index isin method:
states = ["Michigan", "Wisconsin"]
df[df.index.isin(states)].plot(kind='scatter', x='rgdp_change', y='diff_unemp')

How to plot a grid of histograms with Matplotlib in the order of the DataFrame columns?

Considers the simple data frame below:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'var3':[1,3,9,6,1,6,3,1,1,3],
'var1':[9,1,2,6,6,5,9,3,1,7],
'var2':[6,6,2,9,8,3,5,4,1,3]})
df
Now, let's plot a set of histograms from this data:
df.hist(layout=(1,3))
plt.show()
Note that the order (from left to right) of the histograms in the figure is different from the order of the columns in the data frame. How to make the histograms obey the order of its data source?
I could not find a way to do that within the df.hist() function. But you can accomplish it with the simple loop below:
fig, ax = plt.subplots(1, len(df.columns), figsize=(3*len(df.columns), 3))
for i, var in enumerate(df):
df[var].hist(ax=ax[i])
ax[i].set_title(var)
plt.show()
Result:
I like #foglerit's answer, but here's another workaround solution:
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame({'var3':[1,3,9,6,1,6,3,1,1,3],
'var1':[9,1,2,6,6,5,9,3,1,7],
'var2':[6,6,2,9,8,3,5,4,1,3]})
columns = df.columns # save original column names
columns_temp = [] # create temporary column names, numbered
for i, col in enumerate(df.columns):
columns_temp.append('(' + str(i+1) + ') ' + str(col))
df.columns = columns_temp
df.hist(layout=(1,3)) # now the column order is not messed up
df.columns = columns # reassign original column names

Distribution probabilities for each column data frame, in one plot

I am creating probability distributions for each column of my data frame by distplot from seaborn library sns.distplot(). For one plot I do
x = df['A']
sns.distplot(x);
I am trying to use the FacetGrid & Map to have all plots for each columns at once
in this way. But doesn't work at all.
g = sns.FacetGrid(df, col = 'A','B','C','D','E')
g.map(sns.distplot())
I think you need to use melt to reshape your dataframe to long format, see this MVCE:
df = pd.DataFrame(np.random.random((100,5)), columns = list('ABCDE'))
dfm = df.melt(var_name='columns')
g = sns.FacetGrid(dfm, col='columns')
g = (g.map(sns.distplot, 'value'))
Output:
From seaborn 0.11.2 it is not recommended to use FacetGrid directly. Instead, use sns.displot for figure-level plots.
np.random.seed(2022)
df = pd.DataFrame(np.random.random((100,5)), columns = list('ABCDE'))
dfm = df.melt(var_name='columns')
g = sns.displot(data=dfm, x='value', col='columns', col_wrap=3, common_norm=False, kde=True, stat='density')
You're getting this wrong on two levels.
Python syntax.
FacetGrid(df, col = 'A','B','C','D','E') is invalid, because col gets set to A and the remaining characters are interpreted as further arguments. But since they are not named, this is invalid python syntax.
Seaborn concepts.
Seaborn expects a single column name as input for the col or row argument. This means that the dataframe needs to be in a format that has one column which determines to which column or row the respective datum belongs.
You do not call the function to be used by map. The idea is of course that map itself calls it.
Solutions:
Loop over columns:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame(np.random.randn(14,5), columns=list("ABCDE"))
fig, axes = plt.subplots(ncols=5)
for ax, col in zip(axes, df.columns):
sns.distplot(df[col], ax=ax)
plt.show()
Melt dataframe
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.DataFrame(np.random.randn(14,5), columns=list("ABCDE"))
g = sns.FacetGrid(df.melt(), col="variable")
g.map(sns.distplot, "value")
plt.show()
You can use the following:
# listing dataframes types
list(set(df.dtypes.tolist()))
# include only float and integer
df_num = df.select_dtypes(include = ['float64', 'int64'])
# display what has been selected
df_num.head()
# plot
df_num.hist(figsize=(16, 20), bins=50, xlabelsize=8, ylabelsize=8);
I think the easiest approach is to just loop the columns and create a plot.
import numpy as np
improt pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(np.random.random((100,5)), columns = list('ABCDE'))
for col in df.columns:
hist = df[col].hist(bins=10)
print("Plotting for column {}".format(col))
plt.show()

Combine two dataframe boxplots in a twinx figure

I want to display two Pandas dataframes within one figure as boxplots.
As each of the two dataframes has different value range, I would like to have them combined in a twinx figure.
Reduced to the minimum, I have tried the following:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df1 = pd.DataFrame(np.random.randint(0,100,size=(100, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(100,200,size=(100, 2)), columns=list('EF'))
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
df1.boxplot(ax=ax1)
df2.boxplot(ax=ax2)
plt.show()
The result is expectedly not what it should look like (there should be 6 boxes on the plot, actually!)
How can I manage to have the boxplots next to each other?
I tried to set some dummy scatter points on ax1 and ax2, but this did not really help.
The best solution is to concatenate the data frames for plotting and to use a mask. In the creation of the mask, we use the dfs == dfs | dfs.isnull() to create a full matrix with True and then we query on all column names that are not 'E' or 'F'. This gives a 2D matrix that allows you to only plot the first four boxes, as the last two two are masked (so their ticks do appear at the bottom). With the inverse mask ~mask you plot the last two on their own axis and mask the first four.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
df1 = pd.DataFrame(np.random.randint( 0,100,size=(100, 4)), columns=list('ABCD'))
df2 = pd.DataFrame(np.random.randint(100,200,size=(100, 2)), columns=list('EF' ))
dfs = pd.concat([df1, df2])
mask = ((dfs == dfs) | dfs.isnull()) & (dfs.columns != 'E') & (dfs.columns != 'F')
fig, ax1 = plt.subplots()
dfs[mask].boxplot()
ax2 = ax1.twinx()
dfs[~mask].boxplot()
plt.show()