I've created a histogram to see the number of similar values in a list.
data = np.genfromtxt("Pendel-Messung.dat")
stdm = (np.std(data))/((700)**(1/2))
breite = 700**(1/2)
fig2 = plt.figure()
ax1 = plt.subplot(111)
ax1.set_ylim(0,150)
ax1.hist(data, bins=breite)
ax2 = ax1.twinx()
ax2.set_ylim(0,150/700)
plt.show()
I want to create error bars (the error being stdm) in the middle of each bar of the histogram. I know I can create errorbars using
plt.errorbar("something", data, yerr = stdm)
But how do I make them start in the middle of each bar? I thought of just adding breite/2, but that gives me an error.
Sorry, I'm a beginner! Thank you!
ax.hist returns the bin edges and the frequencies (n) so we can use those for x and y in the call to errorbar. Also, the bins input to hist takes either an integer for the number of bins, or a sequence of bin edges. I think you we trying to give a bin width of breite? If so, this should work (you just need to select an appropriate xmax):
n,bin_edges,patches = ax.hist(data,bins=np.arange(0,xmax,breite))
x = bin_edges[:-1]+breite/2.
ax.errorbar(x,n,yerr=stdm,linestyle='None')
Related
I've two plots generated using matplotlib. The first represents my backround and the second a group of points which I want to show. Is there a way to overlap the two plots?
background:
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize = (10,10))
grid_duomo = gpd.read_file('/content/Griglia_2m-SS.shp')
grid_duomo.to_crs(epsg=32632).plot(ax=ax, color='lightgrey')
points:
fig = plt.figure(figsize=(10, 10))
ids = traj_collection_df_new_app['id'].unique()
for id_ in ids:
self_id = traj_collection_df_new_app[traj_collection_df_new_app['id'] == id_]
plt.plot(
self_id['lon'],
self_id['lat'],
# markers= 'o',
# markersize=12
)
plt.plot() will always take the most recent axis found by matplotlib and use it for plotting.
Its practically the same as plt.gca().plot() where plt.gca() stands for "get current axis".
To get full control over which axis is used, you should do something like this:
(the zorder argument is used to set the "vertical stacking" of the artists, e.g. zorder=2 will be plotted on top of zorder=1)
f = plt.figure() # create a figure
ax = f.add_subplot( ... ) # create an axis in the figure f
ax.plot(..., zorder=0)
grid_duomo.plot(ax=ax, ..., zorder=1)
# you can then continue to add more axes to the same figure using
# f.add_subplot() or f.add_axes()
(if this is unclear, maybe check the quick_start guide of matplotlib? )
I have created a stacked bar plot using plotly.express. Each X-axis category has two correspondent Y-values that are stacked to give the total value of the two combined.
How can I add an individual error bar for each Y-value?
I have tried several options that all yield the same: The same value is added to both stacked bars. The error_y values are found in two separate columns in the dataframe: "st_dev_PHB_%" and "st_dev_PHV_%" , respectively, which correspond to 6 categorical values (x="C").
My intuition tells me its best to merge them into a new column in the dataframe, since I load the dataframe in the bar plot. However, each solution I try give an error or that the same value is added to each pair of Y-values.
What would be nice, is if it's possible to have X error_y values corresponding to the X number of variables loaded in the y=[...,...] . But that would off course be too easy .........................
data_MM = read_csv(....)
#data_MM["error_bar"] = data_MM[['st_dev_PHB_%', 'st_dev_PHV_%']].apply(tuple, axis=1).tolist()
#This one adds the values together instead of adding them to same list.
#data_MM["error_bar"] = data_MM['st_dev_PHB_%'] + data_MM['st_dev_PHV_%']
#data_MM["error_bar"] = data_MM[["st_dev_PHB_%", "st_dev_PHV_%"]].values.tolist()
#data_MM["error_bar"] = list(zip(data_MM['st_dev_PHB_%'],data_MM['st_dev_PHV_%']))
bar_plot = px.bar(data_MM, x="C", y=["PHB_wt%", "PHV_wt%"], hover_data =["PHA_total_wt%"], error_y="error_bar")
bar_plot.show()
The most commonly endured error message:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
I see your problem with the same error bar being used in both bars in the stack. However, I got a working solution with Plotly.graph_objs. The only downside was the second bar is plotting at the front, and therefore the upper half of the lower error bar is covered. At least you can still read off the error value from the hover data.
Here is the full code:
n = 20
x = list(range(1, n + 1))
y1 = np.random.random(n)
y2 = y1 + np.random.random(n)
e1 = y1 * 0.2
e2 = y2 * 0.05
trace1 = go.Bar(x=x, y=y1, error_y=dict(type='data', array=e1), name="Trace 1")
trace2 = go.Bar(x=x, y=y2, error_y=dict(type='data', array=e2), name="Trace 2")
fig = go.Figure(data=[trace1, trace2])
fig.update_layout(title="Test Plot", xaxis_title="X axis", yaxis_title="Y axis", barmode="stack")
fig.show()
Here is a resulting plot (top plot showing one error value, bottom plot showing different error value for the same bar stack):
I am trying to plot multiple different plots on a single matplotlib figure with in a for loop. At the moment it is all good in matlab as shown in the picture below and then am able to save the figure as a video frame. Here is a link of a sample video generated in matlab for 10 frames
In python, tried it as below
import matplotlib.pyplot as plt
for frame in range(FrameStart,FrameEnd):#loop1
# data generation code within a for loop for n frames from source video
array1 = np.zeros((200, 3800))
array2 = np.zeros((19,2))
array3 = np.zeros((60,60))
for i in range(len(array2)):#loop2
#generate data for arrays 1 to 3 from the frame data
#end loop2
plt.subplot(6,1,1)
plt.imshow(DataArray,cmap='gray')
plt.subplot(6, 1, 2)
plt.bar(data2D[:,0], data2D[:,1])
plt.subplot(2, 2, 3)
plt.contourf(mapData)
# for fourth plot, use array2[3] and array2[5], plot it as shown and keep the\is #plot without erasing for next frame
not sure how to do the 4th axes with line plots. This needs to be there (done using hold on for this axis in matlab) for the entire sequence of frames processing in the for loop while the other 3 axes needs to be erased and updated with new data for each frame in the movie. The contour plot needs to be square all the time with color bar on the side. At the end of each frame processing, once all the axes are updated, it needs to be saved as a frame of a movie. Again this is easily done in matlab, but not sure in python.
Any suggestions
thanks
I guess you need something like this format.
I have used comments # in code to answer your queries. Please check the snippet
import matplotlib.pyplot as plt
fig=plt.figure(figsize=(6,6))
ax1=fig.add_subplot(311) #3rows 1 column 1st plot
ax2=fig.add_subplot(312) #3rows 1 column 2nd plot
ax3=fig.add_subplot(325) #3rows 2 column 5th plot
ax4=fig.add_subplot(326) #3rows 2 column 6th plot
plt.show()
To turn off ticks you can use plt.axis('off'). I dont know how to interpolate your format so left it blank . You can adjust your figsize based on your requirements.
import numpy as np
from numpy import random
import matplotlib.pyplot as plt
fig=plt.figure(figsize=(6,6)) #First is width Second is height
ax1=fig.add_subplot(311)
ax2=fig.add_subplot(312)
ax3=fig.add_subplot(325)
ax4=fig.add_subplot(326)
#Bar Plot
langs = ['C', 'C++', 'Java', 'Python', 'PHP']
students = [23,17,35,29,12]
ax2.bar(langs,students)
#Contour Plot
xlist = np.linspace(-3.0, 3.0, 100)
ylist = np.linspace(-3.0, 3.0, 100)
X, Y = np.meshgrid(xlist, ylist)
Z = np.sqrt(X**2 + Y**2)
cp = ax3.contourf(X, Y, Z)
fig.colorbar(cp,ax=ax3) #Add a colorbar to a plot
#Multiple line plot
x = np.linspace(-1, 1, 50)
y1 = 2*x + 1
y2 = 2**x + 1
ax4.plot(x, y2)
ax4.plot(x, y1, color='red',linewidth=1.0)
plt.tight_layout() #Make sures plots dont overlap
plt.show()
I am trying to create a function that will iterate over the list of numerical features in a dataframe to display histogram and summary statistics next to it. I am using plt.figtext() to display the statistics but I am getting an error
num_features=[n1,n2,n3]
for i in num_features:
fig, ax = plt.subplots()
plt.hist(df[i])
plt.figtext(1,0.5,df[i].describe() )
ax.set_title(i)
plt.show()
When I do this I get an error/warning message:
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()
It works fine if use df[n].mean() instead of describe()
What am I doing wrong? Is there a better way to print a plot and show some statistics next to it?
You can "simplify" your code, by formatting the dataframe returned by describe() as a string using to_string():
df = pd.DataFrame(np.random.normal(size=(2000,)))
fig, ax = plt.subplots()
ax.hist(df[0])
plt.figtext(0.1,0.5, df.describe().to_string())
plt.figtext(0.75,0.5, df.describe().loc[['mean','std']].to_string())
As shown in the solution above, the text formatting messes up a little bit. To fix this, I added a workaround, where we divide the description into two figures, which are then aligned.
The helper:
def describe_helper(series):
splits = str(series.describe()).split()
keys, values = "", ""
for i in range(0, len(splits), 2):
keys += "{:8}\n".format(splits[i])
values += "{:>8}\n".format(splits[i+1])
return keys, values
Now plot the graph:
demo = np.random.uniform(0,10,100)
plt.hist(demo, bins=10)
plt.figtext(.95, .49, describe_helper(pd.Series(demo))[0], {'multialignment':'left'})
plt.figtext(1.05, .49, describe_helper(pd.Series(demo))[1], {'multialignment':'right'})
plt.show()
If you also want to save the figtext when saving the image, change the bbox_inches:
plt.savefig('fig.png', bbox_inches='tight')
Added this based on feedback and it works fine now.
for i in num_cols:
#calculate number of bins first based on Freedman-Diaconis rule
n_counts=df[i].value_counts().sum()
iqr=df[i].quantile(0.75)-df[i].quantile(0.25)
h = 2 * iqr * (n_counts**(-2/3))
n_bins=(df[i].max()-df[i].min()).round(0).astype(np.int64)
fig, ax = plt.subplots()
plt.hist(df[i],bins=15)
plt.figtext(1,0.5,s=t[i].describe().to_string())
plt.show()
total_income_language = pd.DataFrame(df.groupby('movie_facebook_likes')['gross'].sum())
average_income_language = pd.DataFrame(df.groupby('movie_facebook_likes')['gross'].mean())
d = {'mean':'Average Income','sum':'Total Income'}
df1 = df.groupby('movie_facebook_likes')['gross'].agg(['sum','mean']).rename(columns=d)
ax = df1.plot.bar()
ax.set(xlabel='Facebook Likes', ylabel='Dollar Values(Gross)')
So, the code I have above does a good job ploting a bargraph. But When I tried to make it into a scatter plot by changing the .bar() to .scatter() It gives me the following error:
What do I need to fix to make it a scatter plot?
Expected Output:
As the error tells you, you need to specify x and y arguments which represent the coordinates of your points.
For example (assuming that there are movie_facebook_likes and gross columns in your DataFrame):
ax = df1.plot.scatter(x='movie_facebook_likes', y='gross')
See the documentation:
x, y : label or position, optional
Coordinates for each point.