How to superimpose a barchart on a multiplot in Python - matplotlib

I have a code that plots two charts side by side. I want to additionally plot a bar chart on one of the two plots, with another quantity. My code indicates the hashed lines which would additionally plot a bar chart. But I want to plot this bar chart on a twin axis (its y label and limits on the right of the plot). Currently, the twinx() command does not work on a 1 by 2 plot and gives an error. My code is below
def result_variability_onerow(variable1,variable2, yr):
scenarios_PSN = {'Low Snow': 3, 'Medium Snow': 15, 'High Snow': 46}
scenarios_TSN = {'Low Snow': 46, 'Medium Snow': 25, 'High Snow': 3}
date_form = DateFormatter("%m-%y")
plt.close()
fig, ax = plt.subplots(1 , 2, figsize = [15,5])
# ax2 = ax.twinx()
# ax2.set_ylim(4, 20)
for key, value in scenarios_PSN.items():
p = snow_vary[str(yr) + '_' + str(value)][150:250]
ax[0].plot(p[variable1], label= str(key))
ax[0].set_ylabel(str(variable1))
ax[0].xaxis.set_major_formatter(date_form)
ax[0].grid()
ax[0].legend()
for key, value in scenarios_TSN.items():
t = temp_vary[str(yr) + '_' + str(value)][150:250]
ax[1].plot(t[variable1], label= str(key))
ax[1].xaxis.set_major_formatter(date_form)
ax[1].grid()
# ax[1].bar(t.index, t[variable2], label= "Precipitation")
ax[0].set_title(variable1 + "Phase Change")
ax[1].set_title(variable1 + "Temperature Change")
ax[0].set_ylim(0,180)
ax[1].set_ylim(0,180)
ax[0].set_ylabel("Streamflow (mm)")
plt.savefig('behaviour.pdf', format = 'pdf', bbox_inches = 'tight')
print(str(variable1) + ' for the year ' + str(yr))
result_variability_onerow('streamflow','precip', 2005)

Related

How to expand bars over the month on the x-axis while being the same width?

for i in range(len(basin)):
prefix = "URL here"
state = "OR"
basin_name = basin[i]
df_orig = pd.read_csv(f"{prefix}/{basin_name}.csv", index_col=0)
#---create date x-index
curr_wy_date_rng = pd.date_range(
start=dt(curr_wy-1, 10, 1),
end=dt(curr_wy, 9, 30),
freq="D",
)
if not calendar.isleap(curr_wy):
print("dropping leap day")
df_orig.drop(["02-29"], inplace=True)
use_cols = ["Median ('91-'20)", f"{curr_wy}"]
df = pd.DataFrame(data=df_orig[use_cols].copy())
df.index = curr_wy_date_rng
#--create EOM percent of median values-------------------------------------
curr_wy_month_rng = pd.date_range(
start=dt(curr_wy-1, 10, 1),
end=dt(curr_wy, 6, 30),
freq="M",
)
df_monthly_prec = pd.DataFrame(data=df_monthly_basin[basin[i]].copy())
df_monthly_prec.index = curr_wy_month_rng
df_monthly = df.groupby(pd.Grouper(freq="M")).max()
df_monthly["date"] = df_monthly.index
df_monthly["wy_date"] = df_monthly["date"].apply(lambda x: cal_to_wy(x))
df_monthly.index = pd.to_datetime(df_monthly["wy_date"])
df_monthly.index = df_monthly["date"]
df_monthly["month"] = df_monthly["date"].apply(
lambda x: calendar.month_abbr[x.month]
)
df_monthly["wy"] = df_monthly["wy_date"].apply(lambda x: x.year)
df_monthly.sort_values(by="wy_date", axis=0, inplace=True)
df_monthly.drop(
columns=[i for i in df_monthly.columns if "date" in i], inplace=True
)
# df_monthly.index = df_monthly['month']
df_merge = pd.merge(df_monthly,df_monthly_prec,how='inner', left_index=True, right_index=True)
#---Subplots---------------------------------------------------------------
fig, ax = plt.subplots(figsize=(8,4))
ax.plot(df_merge.index, df_merge["Median ('91-'20)"], color="green", linewidth="1", linestyle="dashed", label = 'Median Snowpack')
ax.plot(df_merge.index, df_merge[f'{curr_wy}'], color='red', linewidth='2',label='WY Current')
#------Seting x-axis range to expand bar width for ax2
ax.bar(df_merge.index,df_merge[basin[i]], color = 'blue', label = 'Monthly %')
#n = n + 1
#--format chart
ax.set_title(chart_name[w], fontweight = 'bold')
w = w + 1
ax.set_ylabel("Basin Precipitation Index")
ax.set_yticklabels([])
ax.margins(x=0)
ax.legend()
#plt.xlim(0,9)
#---Setting date format
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
#---EXPORT
plt.show()
End result desired: Plotting both the monthly dataframe (df_monthly_prec) with the daily dataframe charting only monthly values (df_monthly). The bars for the monthly DataFrame should ideally span the whole month on the chart.
I have tried creating a secondary axis, but had trouble aligning the times for the primary and secondary axes. Ideally, I would like to replace plotting df_monthly with df (showing all daily data instead of just the end-of-month values within the daily dataset).
Any assistance or pointers would be much appreciated! Apologies if additional clarification is needed.

geom_bar for total counts of binned continuous variable

I'm really struggling to achieve what feels like an incredibly basic geom_bar plot. I would like the sum of y to be represented by one solid bar (with colour = black outline) in bins of 10 for x. I know that stat = "identity" is what is creating the unnecessary individual blocks in each bar but can't find an alternative to achieving what is so close to my end goal. I cheated and made the below desired plot in illustrator.
I don't really want to code x as a factor for the bins as I want to keep the format of the axis ticks and text rather than having text as "0 -10", "10 -20" etc. Is there a way to do this in ggplot without the need to use summerise or cut functions on the raw data? I am also aware of geom_col and sat_count options but again, can't achive my desired outcome.
DF as below, where y = counts at various values of a continuous variable x. Also a factor variable of type.
y = c(1 ,1, 3, 2, 1, 1, 2, 1, 1, 1, 1, 1, 4, 1, 1,1, 2, 1, 2, 3, 2, 2, 1)
x = c(26.7, 28.5, 30.0, 34.8, 35.0, 36.4, 38.6, 40.0, 42.1, 43.7, 44.1, 45.0, 45.5, 47.4, 48.0, 57.2, 57.8, 64.2, 65.0, 66.7, 68.0, 74.4, 94.1)
type = c(rep("Type 1", 20), "Type 2", rep("Type 1", 2))
df<-data.frame(x,y,type)
Bar plot of total y count for each bin of x - trying to fill by total of type, but getting individual proportions as shown by line colour = black. Would like total for each type in each bar.
ggplot(df,aes(y=y, x=x))+
geom_bar(stat = "identity",color = "black", aes(fill = type))+
scale_x_binned(limits = c(20,100))+
scale_y_continuous(expand = c(0, 0), breaks = seq(0,10,2)) +
xlab("")+
ylab("Total Count")
Or trying to just have the total count within each bin but don't want the internal lines in the bars, just the outer colour = black for each bar
ggplot(df,aes(y=y, x=x))+
geom_col(fill = "#00C3C6", color = "black")+
scale_x_binned(limits = c(20,100))+
scale_y_continuous(expand = c(0, 0), breaks = seq(0,10,2)) +
xlab("")+
ylab("Total Count")
Here is one way to do it, with previous data transformation and geom_col:
df <- df |>
mutate(bins = floor(x/10) * 10) |>
group_by(bins, type) |>
summarise(y = sum(y))
ggplot(data = df,
aes(y = y,
x = bins))+
geom_col(aes(fill = type),
color = "black")+
scale_x_continuous(breaks = seq(0,100,10)) +
scale_y_continuous(expand = c(0, 0),
breaks = seq(0,10,2)) +
xlab("")+
ylab("Total Count")

matplotlib - Plot 3 histograms with percentage on y axis

I am trying to have 3 histograms on same plot but against percentage on the y axis but not sure how.
You can see what I am trying to do, but I can't get there! The bars don't scale property and not sure how to do that! Please help
data_1 = [1,1,2,3,4,4,5,6,7,7,7,8,9]
data_2 = [4,2,3,4,1,1,6,8,9,9,9,5,6]
data_3 = [1,2,3,4,5,6,7,8,9,9,4,3,7,1,2,3,2,5,7,3,4,7,3,8,2,3,4,7,2]
bin = [1,10,1]
bin[1] += bin[2]*2
bin[0] -= bin[2]
bins_list = np.arange(bin[0],bin[1],bin[2])
fig, ax = plt.subplots(figsize=(15, 10))
counts, bins, patches = plt.hist((np.clip(data_1 , bins_list[0], bins_list[-1]),np.clip(data_2, bins_list[0], bins_list[-1]),np.clip(data_3, bins_list[0], bins_list[-1])),
rwidth=0.8,
bins=bins_list, color=('blue', 'red', 'green'),
weights=None)
xlabels = bins[1:].astype(str)
xlabels[-1] = xlabels[-2] + '>'
xlabels[0] = xlabels[0] + '<'
for lb in range(1, len(xlabels)-1):
xlabels[lb] = str(bins_list[lb])+'-'+xlabels[lb]
N_labels = len(xlabels)
plt.xticks(bin[2] * np.arange(N_labels)+bin[2]/2 + bin[0], rotation=300)
ax.set_xticklabels(xlabels)
total = 0
''' Add percentages and value for each bar '''
for c in range(len(patches)):
for count, p in zip(counts[c], patches[c]):
percentage = '%0.2f%% ' % (100 * float(count) / counts[c].sum())
total += 100 * float(count) / counts[c].sum()
x1 = p.get_x()
y1 = p.get_y() + p.get_height()
ax.annotate(percentage, (x1, y1), rotation=270, fontsize = 10)
ax.yaxis.set_major_formatter(tkr.PercentFormatter(xmax=len(data_3)))
ax.grid(axis='y', color='r', linestyle=':')
ax.set_title("Please help")
ax.set_axisbelow(True)
fig.tight_layout()
Plot Result of code above
You can pass density=True on plt.hist:
counts, bins, patches = plt.hist((np.clip(data_1 , bins_list[0], bins_list[-1]),
np.clip(data_2, bins_list[0], bins_list[-1]),
np.clip(data_3, bins_list[0], bins_list[-1])),
rwidth=0.8,
density=True, # here
bins=bins_list, color=('blue', 'red', 'green'),
weights=None)
Output:

Problem with alignment of geom_point and geom_errorbar

I am trying to plot how different predictors associate with stroke and underlying phenotypes (i.e. cholesterol). In my data, I originally had working ggplot code in which shapes denoted the different variables (stroke, HDL cholesterol and total cholesterol) and colour denoted type (i.e. disease (stroke) or phenotype (HDL/total cholesterol). To make it more intuitive, I want to swap shape and colour around but now that I do this, I am having issues with position dodge and the alignment of geom_point and geom_error
stroke_graph <- ggplot(stroke,aes(y=as.numeric(stroke$test),
x=Clock,
shape = Type,
colour = Variable)) +
geom_point(data=stroke, aes(shape=Type, colour=Variable), show.legend=TRUE,
position=position_dodge(width=0.5), size = 3) +
geom_errorbar(aes(ymin = as.numeric(stroke$LCI), ymax= as.numeric(stroke$UCI)),
position = position_dodge(0.5), width = 0.05,
colour ="black")+
ylab("standardised beta/log odds")+ xlab ("")+
geom_hline(yintercept = 0, linetype = "dotted")+
theme(axis.text.x = element_text(size = 10, vjust = 0.5), legend.position = "none",
plot.title = element_text(size = 12))+
scale_y_continuous(limit = c(-0.402, 0.7))+ scale_shape_manual(values=c(15, 17, 18))+
theme(legend.position="right") + labs(shape = "Variable") + guides(shape = guide_legend(reverse=TRUE)) +
coord_flip()
stroke_graph + ggtitle("Stroke and Associated Phenotypes") + theme(plot.title = element_text(hjust = 0.5))
Graph now: 1
Previously working graph - only difference in code is swapping "Type" and "Variable": 2

'matplotlib.pyplot' has no attribute 'autofmt_xdate'

A project I previously submitted for a course worked as expected. I went back to run the code again and now get an python traceback error message that didn't occur before:
'matplotlib.pyplot' has no attribute 'autofmt_xdate'
I loaded the weather station data files and ran all the code, which previously worked. Below is the code for the visualization plot:
plt.figure()
plt.plot(minmaxdf.loc[:,'Month-Day'], minmaxdf.loc[:,'min_tmps'] ,'-', c = 'cyan', linewidth=0.5, label = '10yr record lows')
plt.plot(minmaxdf.loc[:,'Month-Day'], minmaxdf.loc[:,'max_tmps'] , '-', c = 'orange', linewidth=0.5, label = '10yr record highs')
plt.gca().fill_between(range(len(minmaxdf.loc[:,'min_tmps'])), minmaxdf['min_tmps'], minmaxdf['max_tmps'], facecolor = (0.5, 0.5, 0.5), alpha = 0.5)
plt.scatter(minbreach15.loc[:,'Month-Day'], minbreach15.loc[:,'min_tmps_breach15'], s = 10, c = 'blue', label = 'Record low broken - 2015')
plt.scatter(maxbreach15.loc[:,'Month-Day'], maxbreach15.loc[:,'max_tmps_breach15'], s = 10, c = 'red', label = 'Record high broken - 2015')
plt.xlabel('Month')
plt.ylabel('Temperature (Tenths of Degrees C)')
plt.title('10yr Max/Min Temperature Range for Wilton CT 06897')
plt.gca().axis([0, 400, -500, 500])
plt.xticks(range(0, len(minmaxdf.loc[:,'Month-Day']), 30), minmaxdf.loc[:,'Month-Day'].index[range(0, len(minmaxdf.loc[:,'Month-Day']), 30)], rotation = '-45')
plt.xticks( np.linspace(0, 15 + 30*11 , num = 12), (r'Jan', r'Feb', r'Mar', r'Apr', r'May', r'Jun', r'Jul', r'Aug', r'Sep', r'Oct', r'Nov', r'Dec') )
plt.legend(loc = 4, frameon = False)
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.autofmt_xdate()
plt.show()
produced a chart of day of year (2004-14) 10yr average temp max/mins, overlay with scatter points of 2015 max/mins that exceeded the averages.
autofmt_xdate() is a method of the Figure. The command hence needs to be
plt.gcf().autofmt_xdate()