How can I make a scatter Plot with what I have? - pandas

total_income_language = pd.DataFrame(df.groupby('movie_facebook_likes')['gross'].sum())
average_income_language = pd.DataFrame(df.groupby('movie_facebook_likes')['gross'].mean())
d = {'mean':'Average Income','sum':'Total Income'}
df1 = df.groupby('movie_facebook_likes')['gross'].agg(['sum','mean']).rename(columns=d)
ax = df1.plot.bar()
ax.set(xlabel='Facebook Likes', ylabel='Dollar Values(Gross)')
So, the code I have above does a good job ploting a bargraph. But When I tried to make it into a scatter plot by changing the .bar() to .scatter() It gives me the following error:
What do I need to fix to make it a scatter plot?
Expected Output:

As the error tells you, you need to specify x and y arguments which represent the coordinates of your points.
For example (assuming that there are movie_facebook_likes and gross columns in your DataFrame):
ax = df1.plot.scatter(x='movie_facebook_likes', y='gross')
See the documentation:
x, y : label or position, optional
Coordinates for each point.

Related

Remove thousands(k) in plotly line plot

I have plotly line plot with x axis year & week as integer. I do get correct data for year weeks 202101,202102,202103. In plot it shows as 202.1K,202.1K, 202.1K. I am looking for to show 202101,202102, 202103 in plot axis as well. Below is my code.
if chart_choice == 'line':
print(dff['week'])
dff = dff.groupby(['product','week'], as_index=False)[['CYSales']].sum()
fig = px.line(dff, x='week', y=num_value, color='product')
return fig
thanks for help
Does tickformat = "000" help perhaps? Have a look here Plotly for R: Remove the k that apears on the y axis when the dataset contains numbers larger than 1000

Adding error_y from two columns in a stacked bar graph, plotly express

I have created a stacked bar plot using plotly.express. Each X-axis category has two correspondent Y-values that are stacked to give the total value of the two combined.
How can I add an individual error bar for each Y-value?
I have tried several options that all yield the same: The same value is added to both stacked bars. The error_y values are found in two separate columns in the dataframe: "st_dev_PHB_%" and "st_dev_PHV_%" , respectively, which correspond to 6 categorical values (x="C").
My intuition tells me its best to merge them into a new column in the dataframe, since I load the dataframe in the bar plot. However, each solution I try give an error or that the same value is added to each pair of Y-values.
What would be nice, is if it's possible to have X error_y values corresponding to the X number of variables loaded in the y=[...,...] . But that would off course be too easy .........................
data_MM = read_csv(....)
#data_MM["error_bar"] = data_MM[['st_dev_PHB_%', 'st_dev_PHV_%']].apply(tuple, axis=1).tolist()
#This one adds the values together instead of adding them to same list.
#data_MM["error_bar"] = data_MM['st_dev_PHB_%'] + data_MM['st_dev_PHV_%']
#data_MM["error_bar"] = data_MM[["st_dev_PHB_%", "st_dev_PHV_%"]].values.tolist()
#data_MM["error_bar"] = list(zip(data_MM['st_dev_PHB_%'],data_MM['st_dev_PHV_%']))
bar_plot = px.bar(data_MM, x="C", y=["PHB_wt%", "PHV_wt%"], hover_data =["PHA_total_wt%"], error_y="error_bar")
bar_plot.show()
The most commonly endured error message:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
I see your problem with the same error bar being used in both bars in the stack. However, I got a working solution with Plotly.graph_objs. The only downside was the second bar is plotting at the front, and therefore the upper half of the lower error bar is covered. At least you can still read off the error value from the hover data.
Here is the full code:
n = 20
x = list(range(1, n + 1))
y1 = np.random.random(n)
y2 = y1 + np.random.random(n)
e1 = y1 * 0.2
e2 = y2 * 0.05
trace1 = go.Bar(x=x, y=y1, error_y=dict(type='data', array=e1), name="Trace 1")
trace2 = go.Bar(x=x, y=y2, error_y=dict(type='data', array=e2), name="Trace 2")
fig = go.Figure(data=[trace1, trace2])
fig.update_layout(title="Test Plot", xaxis_title="X axis", yaxis_title="Y axis", barmode="stack")
fig.show()
Here is a resulting plot (top plot showing one error value, bottom plot showing different error value for the same bar stack):

How to get the array corresponding exaclty to contourf?

I have a rather complicated two dimensional function that I represent with few levels using contourf. How do I get the array corresponding exactly to filled contours ?
For example :
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(0,1,100)
y = np.linspace(0,1,100)
[X,Y] = np.meshgrid(x,y)
Z = X**2 + Y**2
plt.subplot(1,2,1)
plt.imshow(Z, origin = 'lower')
plt.subplot(1,2,2)
plt.contourf(Z,[0,1,2])
plt.show()
plt.savefig('test.png')
I'd like to have the array from the contourplot that returns constant values between the different contours.
So far I did some thresholding:
test = Z
test[test<1] = 0
test[test>=1] = 2
plt.contourf(X,Y,test,[0,1,2])
plt.savefig('test1.png')
But contourf does a much better job at interpolating things. Furthermore, thresholding 'by hand' becomes a bit long when I have multiple contours.
I guess that, because contourf does all the job, there is a way to get this from the contourf object ?
P.S : why does my first piece of code produces subplots of different sizes ?

scatter plot pandas one variable on x axis and y axis is dataframe index

I want to make a scatter plot in pandas the x axis is the mean and the y axis should be the index of data frame , but I couldn't proceed this is my code I got a lot of errors .
y=list(range(len(df.index)))
df.plot(kind='scatter', x='meandf', y )
error : SyntaxError: positional argument follows keyword argument
Try the following:
y=list(range(len(df.index)))
df.meandf.plot(x='meandf', y=y)
Or more concisely since you're plotting a Series:
df.meandf.plot(y=y)
If you need to maintain kind = 'scatter' you'll need to pass a dataframe:
df['y'] = y # create y column for the list you created
df.plot(x='a', y='y', kind='scatter', marker='.')
df.drop('y', axis=1, inplace=True)

How do I add error bars on a histogram?

I've created a histogram to see the number of similar values in a list.
data = np.genfromtxt("Pendel-Messung.dat")
stdm = (np.std(data))/((700)**(1/2))
breite = 700**(1/2)
fig2 = plt.figure()
ax1 = plt.subplot(111)
ax1.set_ylim(0,150)
ax1.hist(data, bins=breite)
ax2 = ax1.twinx()
ax2.set_ylim(0,150/700)
plt.show()
I want to create error bars (the error being stdm) in the middle of each bar of the histogram. I know I can create errorbars using
plt.errorbar("something", data, yerr = stdm)
But how do I make them start in the middle of each bar? I thought of just adding breite/2, but that gives me an error.
Sorry, I'm a beginner! Thank you!
ax.hist returns the bin edges and the frequencies (n) so we can use those for x and y in the call to errorbar. Also, the bins input to hist takes either an integer for the number of bins, or a sequence of bin edges. I think you we trying to give a bin width of breite? If so, this should work (you just need to select an appropriate xmax):
n,bin_edges,patches = ax.hist(data,bins=np.arange(0,xmax,breite))
x = bin_edges[:-1]+breite/2.
ax.errorbar(x,n,yerr=stdm,linestyle='None')