numbers in yaxis is not arranged in order in matplotlib - pandas

[the numbers in yaxis in the plot is not arranged in order while the word in x axis are too close. I am actually scraping wikipedia table for COVID19 cases, so i dont save as csv. i am only ploting it directly from the website.]
my code also below
URL="https://en.wikipedia.org/wiki/COVID19_pandemic_in_Nigeria"
html=requests.get(URL)
bsObj= BeautifulSoup(html.content, 'html.parser')
states= []
cases=[]
active=[]
recovered=[]
death=[]
for items in bsObj.find("table",{"class":"wikitable
sortable"}).find_all('tr')[1:37]:
data = items.find_all(['th',{"align":"left"},'td'])
states.append(data[0].a.text)
cases.append(data[1].b.text)
active.append(data[2].text)
recovered.append(data[3].text)
death.append(data[4].text)
table= ["STATES","ACTIVE"]
tab= pd.DataFrame(list(zip(states,active)),columns=table)
tab["ACTIVE"]=tab["ACTIVE"].replace('\n','', regex=True)
x=tab["STATES"]
y=tab["ACTIVE"]
plt.cla()
plt.bar(x,y, color="green")
plt.xticks(fontsize= 5)
plt.yticks(fontsize= 8)
plt.title("PLOTTERSCA")
plt.show()

It's hard to say without the data, but you can try this to sort the values on y axis:
y.sort()
plt.bar(x,y, color="green")

Barcharts are plotted in the order they are presented. In this case, there is no need to create a dataframe. Try the following:
x=[s.replace('\n', '') for s in states]
y=np.array(active)
order = np.argsort(y)
xNew = [x[i] for i in order]
yNew = y[order]
plt.cla()
plt.bar(xNew,yNew, color="green")
plt.xticks(fontsize= 5, rotation=90)
plt.yticks(fontsize= 8)
plt.title("PLOTTERSCA")
plt.show()
Here, we have reordered everything based upon the y values. Also, the xticks have been rotated so that they are easier to see ...

Related

Stacked hue histogram

I don't have the reputation to add inline images I'm sorry.
This is the code I found:
bins = np.linspace(df.Principal.min(), df.Principal.max(), 10)
g = sns.FacetGrid(df, col="Gender", hue="loan_status", palette="Set1", col_wrap=2)
g.map(plt.hist, 'Principal', bins=bins, ec="k")
g.axes[-1].legend()
plt.show()
Output:
I want to do something similar with some data I have:
bins = np.linspace(df.overall.min(), df.overall.max(), 10)
g = sns.FacetGrid(df, col="player_positions", hue="preferred_foot", palette="Set1", col_wrap=4)
g.map(plt.hist, 'overall', bins=bins, ec="k")
g.axes[-1].legend()
plt.show()
The hue "preferred_foot" is just left and right.
My output:
I am not sure why I can't see the left values on the plot
df['preferred_foot'].value_counts()
Right 13960
Left 4318
I am fairly sure those are not stacked histograms, but just two histograms one behind the other. I believe your "left" red bars are simply hidden behind the "right" blue bars.
You could try adding some alpha=0.5 or changing the order of the hues (add hue_order=['Right','Left'] to the call to FacetGrid.

Making multiple pie charts out of a pandas dataframe (one for each column)

My question is similar to Making multiple pie charts out of a pandas dataframe (one for each row).
However, instead of each row, I am looking for each column in my case.
I can make pie chart for each column, however, as I have 12 columns the pie charts are too much close to each other.
I have used this code:
fig, axes = plt.subplots(4, 3, figsize=(10, 6))
for i, (idx, row) in enumerate(df.iterrows()):
ax = axes[i // 3, i % 3]
row = row[row.gt(row.sum() * .01)]
ax.pie(row, labels=row.index, startangle=30)
ax.set_title(idx)
fig.subplots_adjust(wspace=.2)
and I have the following result
But I want is on the other side. I need to have 12 pie charts (becuase I have 12 columns) and each pie chart should have 4 sections (which are leg, car, walk, and bike)
and if I write this code
fig, axes = plt.subplots(4,3)
for i, col in enumerate(df.columns):
ax = axes[i // 3, i % 3]
plt.plot(df[col])
then I have the following results:
and if I use :
plot = df.plot.pie(subplots=True, figsize=(17, 8),labels=['pt','car','walk','bike'])
then I have the following results:
Which is quite what I am looking for. but it is not possible to read the pie charts. if it can produce in more clear output, then it is better.
As in your linked post I would use matplotlib.pyplot for this. The accepted answer uses plt.subplots(2, 3) and I would suggest doing the same for creating two rows with each 3 plots in them.
Like this:
fig, axes = plt.subplots(2,3)
for i, col in enumerate(df.columns):
ax = axes[i // 3, i % 3]
ax.plot(df[col])
Finally, I understood that if I swap rows and columns
df_sw = df.T
Then I can use the code in the examples:
Making multiple pie charts out of a pandas dataframe (one for each row)

How to display Summary statistics next to a plot using matplotlib or seaborn?

I am trying to create a function that will iterate over the list of numerical features in a dataframe to display histogram and summary statistics next to it. I am using plt.figtext() to display the statistics but I am getting an error
num_features=[n1,n2,n3]
for i in num_features:
fig, ax = plt.subplots()
plt.hist(df[i])
plt.figtext(1,0.5,df[i].describe() )
ax.set_title(i)
plt.show()
When I do this I get an error/warning message:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
It works fine if use df[n].mean() instead of describe()
What am I doing wrong? Is there a better way to print a plot and show some statistics next to it?
You can "simplify" your code, by formatting the dataframe returned by describe() as a string using to_string():
df = pd.DataFrame(np.random.normal(size=(2000,)))
fig, ax = plt.subplots()
ax.hist(df[0])
plt.figtext(0.1,0.5, df.describe().to_string())
plt.figtext(0.75,0.5, df.describe().loc[['mean','std']].to_string())
As shown in the solution above, the text formatting messes up a little bit. To fix this, I added a workaround, where we divide the description into two figures, which are then aligned.
The helper:
def describe_helper(series):
splits = str(series.describe()).split()
keys, values = "", ""
for i in range(0, len(splits), 2):
keys += "{:8}\n".format(splits[i])
values += "{:>8}\n".format(splits[i+1])
return keys, values
Now plot the graph:
demo = np.random.uniform(0,10,100)
plt.hist(demo, bins=10)
plt.figtext(.95, .49, describe_helper(pd.Series(demo))[0], {'multialignment':'left'})
plt.figtext(1.05, .49, describe_helper(pd.Series(demo))[1], {'multialignment':'right'})
plt.show()
If you also want to save the figtext when saving the image, change the bbox_inches:
plt.savefig('fig.png', bbox_inches='tight')
Added this based on feedback and it works fine now.
for i in num_cols:
#calculate number of bins first based on Freedman-Diaconis rule
n_counts=df[i].value_counts().sum()
iqr=df[i].quantile(0.75)-df[i].quantile(0.25)
h = 2 * iqr * (n_counts**(-2/3))
n_bins=(df[i].max()-df[i].min()).round(0).astype(np.int64)
fig, ax = plt.subplots()
plt.hist(df[i],bins=15)
plt.figtext(1,0.5,s=t[i].describe().to_string())
plt.show()

How to plot separately for each column without looping through columns in a pandas df?

I am trying to plot each column separately in df with x axis as the index and y axis as the respective columns. I can loop through the columns and plot them but it is too slow. Any ideas to do this without looping. ??
I have created a dummy dataframe below. It is way too slow for bigger columns. Many Thanks for help.
DataFrame(np.random.randint(0,10,size=(10, 4)), columns=('A','B','C','D'))
for j in range(len(data.columns)):
y = data.iloc[:, j].values
x = data.index
fig = plt.figure()
plt.plot(x, y)
plt.show()

How do I add error bars on a histogram?

I've created a histogram to see the number of similar values in a list.
data = np.genfromtxt("Pendel-Messung.dat")
stdm = (np.std(data))/((700)**(1/2))
breite = 700**(1/2)
fig2 = plt.figure()
ax1 = plt.subplot(111)
ax1.set_ylim(0,150)
ax1.hist(data, bins=breite)
ax2 = ax1.twinx()
ax2.set_ylim(0,150/700)
plt.show()
I want to create error bars (the error being stdm) in the middle of each bar of the histogram. I know I can create errorbars using
plt.errorbar("something", data, yerr = stdm)
But how do I make them start in the middle of each bar? I thought of just adding breite/2, but that gives me an error.
Sorry, I'm a beginner! Thank you!
ax.hist returns the bin edges and the frequencies (n) so we can use those for x and y in the call to errorbar. Also, the bins input to hist takes either an integer for the number of bins, or a sequence of bin edges. I think you we trying to give a bin width of breite? If so, this should work (you just need to select an appropriate xmax):
n,bin_edges,patches = ax.hist(data,bins=np.arange(0,xmax,breite))
x = bin_edges[:-1]+breite/2.
ax.errorbar(x,n,yerr=stdm,linestyle='None')