'matplotlib.pyplot' has no attribute 'autofmt_xdate' - matplotlib

A project I previously submitted for a course worked as expected. I went back to run the code again and now get an python traceback error message that didn't occur before:
'matplotlib.pyplot' has no attribute 'autofmt_xdate'
I loaded the weather station data files and ran all the code, which previously worked. Below is the code for the visualization plot:
plt.figure()
plt.plot(minmaxdf.loc[:,'Month-Day'], minmaxdf.loc[:,'min_tmps'] ,'-', c = 'cyan', linewidth=0.5, label = '10yr record lows')
plt.plot(minmaxdf.loc[:,'Month-Day'], minmaxdf.loc[:,'max_tmps'] , '-', c = 'orange', linewidth=0.5, label = '10yr record highs')
plt.gca().fill_between(range(len(minmaxdf.loc[:,'min_tmps'])), minmaxdf['min_tmps'], minmaxdf['max_tmps'], facecolor = (0.5, 0.5, 0.5), alpha = 0.5)
plt.scatter(minbreach15.loc[:,'Month-Day'], minbreach15.loc[:,'min_tmps_breach15'], s = 10, c = 'blue', label = 'Record low broken - 2015')
plt.scatter(maxbreach15.loc[:,'Month-Day'], maxbreach15.loc[:,'max_tmps_breach15'], s = 10, c = 'red', label = 'Record high broken - 2015')
plt.xlabel('Month')
plt.ylabel('Temperature (Tenths of Degrees C)')
plt.title('10yr Max/Min Temperature Range for Wilton CT 06897')
plt.gca().axis([0, 400, -500, 500])
plt.xticks(range(0, len(minmaxdf.loc[:,'Month-Day']), 30), minmaxdf.loc[:,'Month-Day'].index[range(0, len(minmaxdf.loc[:,'Month-Day']), 30)], rotation = '-45')
plt.xticks( np.linspace(0, 15 + 30*11 , num = 12), (r'Jan', r'Feb', r'Mar', r'Apr', r'May', r'Jun', r'Jul', r'Aug', r'Sep', r'Oct', r'Nov', r'Dec') )
plt.legend(loc = 4, frameon = False)
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.autofmt_xdate()
plt.show()
produced a chart of day of year (2004-14) 10yr average temp max/mins, overlay with scatter points of 2015 max/mins that exceeded the averages.

autofmt_xdate() is a method of the Figure. The command hence needs to be
plt.gcf().autofmt_xdate()

Related

How can I fix a mismatch while only one plot within the group is not showing?

The plot computed the first seven locations but does not plot the last one. When it plots all the locations, all the bars look the same instead of each of them to correspond with the location based on the number of times people spent at said location. I have been trying to fix it, but in vain.
print('Total Hours Spent in 2019 \n', aqua.groupby('Location')['In hours, what was your typical length of stay in 2019?'].count().sort_values(ascending = False))
location = ['Harry Stone', 'Lake Highlands', 'Crawford', 'Samuell Grand', 'Kidd Springs', 'Tietze', 'Fretz', 'Exline']
plt.figure(figsize = (15, 25))
for l in location:
ind = location.index(l)
plt.subplot(4, 2, ind + 1)
aquatic = aqua[aqua['Location'] == l]
count = aquatic['In hours, what was your typical length of stay in 2019?'].value_counts()
Index = [0, 1, 2, 3, 4]
plt.bar(Index, count, color = ['orange', 'yellow', 'green', 'cyan'])
plt.xticks(Index, ['2', '4', '3', 'nan', '0'])
plt.xlabel('How Many Hours Spent')
plt.ylabel('Answers Count')
plt.title('Which Location People Spent the most time ' + l)
for i in range(len(Index)):
plt.text(i, Index[0], count[i], ha = 'right', va = 'bottom')
[Plot of all location based on time spent in location](https://i.stack.imgur.com/696P3.png)

How to expand bars over the month on the x-axis while being the same width?

for i in range(len(basin)):
prefix = "URL here"
state = "OR"
basin_name = basin[i]
df_orig = pd.read_csv(f"{prefix}/{basin_name}.csv", index_col=0)
#---create date x-index
curr_wy_date_rng = pd.date_range(
start=dt(curr_wy-1, 10, 1),
end=dt(curr_wy, 9, 30),
freq="D",
)
if not calendar.isleap(curr_wy):
print("dropping leap day")
df_orig.drop(["02-29"], inplace=True)
use_cols = ["Median ('91-'20)", f"{curr_wy}"]
df = pd.DataFrame(data=df_orig[use_cols].copy())
df.index = curr_wy_date_rng
#--create EOM percent of median values-------------------------------------
curr_wy_month_rng = pd.date_range(
start=dt(curr_wy-1, 10, 1),
end=dt(curr_wy, 6, 30),
freq="M",
)
df_monthly_prec = pd.DataFrame(data=df_monthly_basin[basin[i]].copy())
df_monthly_prec.index = curr_wy_month_rng
df_monthly = df.groupby(pd.Grouper(freq="M")).max()
df_monthly["date"] = df_monthly.index
df_monthly["wy_date"] = df_monthly["date"].apply(lambda x: cal_to_wy(x))
df_monthly.index = pd.to_datetime(df_monthly["wy_date"])
df_monthly.index = df_monthly["date"]
df_monthly["month"] = df_monthly["date"].apply(
lambda x: calendar.month_abbr[x.month]
)
df_monthly["wy"] = df_monthly["wy_date"].apply(lambda x: x.year)
df_monthly.sort_values(by="wy_date", axis=0, inplace=True)
df_monthly.drop(
columns=[i for i in df_monthly.columns if "date" in i], inplace=True
)
# df_monthly.index = df_monthly['month']
df_merge = pd.merge(df_monthly,df_monthly_prec,how='inner', left_index=True, right_index=True)
#---Subplots---------------------------------------------------------------
fig, ax = plt.subplots(figsize=(8,4))
ax.plot(df_merge.index, df_merge["Median ('91-'20)"], color="green", linewidth="1", linestyle="dashed", label = 'Median Snowpack')
ax.plot(df_merge.index, df_merge[f'{curr_wy}'], color='red', linewidth='2',label='WY Current')
#------Seting x-axis range to expand bar width for ax2
ax.bar(df_merge.index,df_merge[basin[i]], color = 'blue', label = 'Monthly %')
#n = n + 1
#--format chart
ax.set_title(chart_name[w], fontweight = 'bold')
w = w + 1
ax.set_ylabel("Basin Precipitation Index")
ax.set_yticklabels([])
ax.margins(x=0)
ax.legend()
#plt.xlim(0,9)
#---Setting date format
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
#---EXPORT
plt.show()
End result desired: Plotting both the monthly dataframe (df_monthly_prec) with the daily dataframe charting only monthly values (df_monthly). The bars for the monthly DataFrame should ideally span the whole month on the chart.
I have tried creating a secondary axis, but had trouble aligning the times for the primary and secondary axes. Ideally, I would like to replace plotting df_monthly with df (showing all daily data instead of just the end-of-month values within the daily dataset).
Any assistance or pointers would be much appreciated! Apologies if additional clarification is needed.

How to superimpose a barchart on a multiplot in Python

I have a code that plots two charts side by side. I want to additionally plot a bar chart on one of the two plots, with another quantity. My code indicates the hashed lines which would additionally plot a bar chart. But I want to plot this bar chart on a twin axis (its y label and limits on the right of the plot). Currently, the twinx() command does not work on a 1 by 2 plot and gives an error. My code is below
def result_variability_onerow(variable1,variable2, yr):
scenarios_PSN = {'Low Snow': 3, 'Medium Snow': 15, 'High Snow': 46}
scenarios_TSN = {'Low Snow': 46, 'Medium Snow': 25, 'High Snow': 3}
date_form = DateFormatter("%m-%y")
plt.close()
fig, ax = plt.subplots(1 , 2, figsize = [15,5])
# ax2 = ax.twinx()
# ax2.set_ylim(4, 20)
for key, value in scenarios_PSN.items():
p = snow_vary[str(yr) + '_' + str(value)][150:250]
ax[0].plot(p[variable1], label= str(key))
ax[0].set_ylabel(str(variable1))
ax[0].xaxis.set_major_formatter(date_form)
ax[0].grid()
ax[0].legend()
for key, value in scenarios_TSN.items():
t = temp_vary[str(yr) + '_' + str(value)][150:250]
ax[1].plot(t[variable1], label= str(key))
ax[1].xaxis.set_major_formatter(date_form)
ax[1].grid()
# ax[1].bar(t.index, t[variable2], label= "Precipitation")
ax[0].set_title(variable1 + "Phase Change")
ax[1].set_title(variable1 + "Temperature Change")
ax[0].set_ylim(0,180)
ax[1].set_ylim(0,180)
ax[0].set_ylabel("Streamflow (mm)")
plt.savefig('behaviour.pdf', format = 'pdf', bbox_inches = 'tight')
print(str(variable1) + ' for the year ' + str(yr))
result_variability_onerow('streamflow','precip', 2005)

geom_bar for total counts of binned continuous variable

I'm really struggling to achieve what feels like an incredibly basic geom_bar plot. I would like the sum of y to be represented by one solid bar (with colour = black outline) in bins of 10 for x. I know that stat = "identity" is what is creating the unnecessary individual blocks in each bar but can't find an alternative to achieving what is so close to my end goal. I cheated and made the below desired plot in illustrator.
I don't really want to code x as a factor for the bins as I want to keep the format of the axis ticks and text rather than having text as "0 -10", "10 -20" etc. Is there a way to do this in ggplot without the need to use summerise or cut functions on the raw data? I am also aware of geom_col and sat_count options but again, can't achive my desired outcome.
DF as below, where y = counts at various values of a continuous variable x. Also a factor variable of type.
y = c(1 ,1, 3, 2, 1, 1, 2, 1, 1, 1, 1, 1, 4, 1, 1,1, 2, 1, 2, 3, 2, 2, 1)
x = c(26.7, 28.5, 30.0, 34.8, 35.0, 36.4, 38.6, 40.0, 42.1, 43.7, 44.1, 45.0, 45.5, 47.4, 48.0, 57.2, 57.8, 64.2, 65.0, 66.7, 68.0, 74.4, 94.1)
type = c(rep("Type 1", 20), "Type 2", rep("Type 1", 2))
df<-data.frame(x,y,type)
Bar plot of total y count for each bin of x - trying to fill by total of type, but getting individual proportions as shown by line colour = black. Would like total for each type in each bar.
ggplot(df,aes(y=y, x=x))+
geom_bar(stat = "identity",color = "black", aes(fill = type))+
scale_x_binned(limits = c(20,100))+
scale_y_continuous(expand = c(0, 0), breaks = seq(0,10,2)) +
xlab("")+
ylab("Total Count")
Or trying to just have the total count within each bin but don't want the internal lines in the bars, just the outer colour = black for each bar
ggplot(df,aes(y=y, x=x))+
geom_col(fill = "#00C3C6", color = "black")+
scale_x_binned(limits = c(20,100))+
scale_y_continuous(expand = c(0, 0), breaks = seq(0,10,2)) +
xlab("")+
ylab("Total Count")
Here is one way to do it, with previous data transformation and geom_col:
df <- df |>
mutate(bins = floor(x/10) * 10) |>
group_by(bins, type) |>
summarise(y = sum(y))
ggplot(data = df,
aes(y = y,
x = bins))+
geom_col(aes(fill = type),
color = "black")+
scale_x_continuous(breaks = seq(0,100,10)) +
scale_y_continuous(expand = c(0, 0),
breaks = seq(0,10,2)) +
xlab("")+
ylab("Total Count")

Confidence Interval for large dataset

I would like to get a confidence interval for very large datasets. It is composed by around 700,000 points for x and y. I also tried to use less data, like 200 points, and with that it is possible to plot. But, when it comes to my specific datasets, it does not show the confidence interval.
For that, my code is based on:
x_x = np.array(y_test[:, 0]) #about 700,000 points
y_y = np.array(y_pred[:, 0]) #about 700,000 points
sns.set(style = 'whitegrid')
p = sns.FacetGrid(d, size = 4, aspect = 1.5)
p.map(plt.scatter, 'x_x', 'y_y', color = 'red')
p.map(sns.regplot, 'x_x', 'y_y', scatter = False, ci = 95,
fit_reg = True, color = 'blue')
p.map(sns.regplot, 'x_x', 'y_y', scatter = False, ci = 0,
fit_reg = True, color = 'darkgreen')
And also the Figure so far: