I have a code that plots two charts side by side. I want to additionally plot a bar chart on one of the two plots, with another quantity. My code indicates the hashed lines which would additionally plot a bar chart. But I want to plot this bar chart on a twin axis (its y label and limits on the right of the plot). Currently, the twinx() command does not work on a 1 by 2 plot and gives an error. My code is below
def result_variability_onerow(variable1,variable2, yr):
scenarios_PSN = {'Low Snow': 3, 'Medium Snow': 15, 'High Snow': 46}
scenarios_TSN = {'Low Snow': 46, 'Medium Snow': 25, 'High Snow': 3}
date_form = DateFormatter("%m-%y")
plt.close()
fig, ax = plt.subplots(1 , 2, figsize = [15,5])
# ax2 = ax.twinx()
# ax2.set_ylim(4, 20)
for key, value in scenarios_PSN.items():
p = snow_vary[str(yr) + '_' + str(value)][150:250]
ax[0].plot(p[variable1], label= str(key))
ax[0].set_ylabel(str(variable1))
ax[0].xaxis.set_major_formatter(date_form)
ax[0].grid()
ax[0].legend()
for key, value in scenarios_TSN.items():
t = temp_vary[str(yr) + '_' + str(value)][150:250]
ax[1].plot(t[variable1], label= str(key))
ax[1].xaxis.set_major_formatter(date_form)
ax[1].grid()
# ax[1].bar(t.index, t[variable2], label= "Precipitation")
ax[0].set_title(variable1 + "Phase Change")
ax[1].set_title(variable1 + "Temperature Change")
ax[0].set_ylim(0,180)
ax[1].set_ylim(0,180)
ax[0].set_ylabel("Streamflow (mm)")
plt.savefig('behaviour.pdf', format = 'pdf', bbox_inches = 'tight')
print(str(variable1) + ' for the year ' + str(yr))
result_variability_onerow('streamflow','precip', 2005)
Related
for i in range(len(basin)):
prefix = "URL here"
state = "OR"
basin_name = basin[i]
df_orig = pd.read_csv(f"{prefix}/{basin_name}.csv", index_col=0)
#---create date x-index
curr_wy_date_rng = pd.date_range(
start=dt(curr_wy-1, 10, 1),
end=dt(curr_wy, 9, 30),
freq="D",
)
if not calendar.isleap(curr_wy):
print("dropping leap day")
df_orig.drop(["02-29"], inplace=True)
use_cols = ["Median ('91-'20)", f"{curr_wy}"]
df = pd.DataFrame(data=df_orig[use_cols].copy())
df.index = curr_wy_date_rng
#--create EOM percent of median values-------------------------------------
curr_wy_month_rng = pd.date_range(
start=dt(curr_wy-1, 10, 1),
end=dt(curr_wy, 6, 30),
freq="M",
)
df_monthly_prec = pd.DataFrame(data=df_monthly_basin[basin[i]].copy())
df_monthly_prec.index = curr_wy_month_rng
df_monthly = df.groupby(pd.Grouper(freq="M")).max()
df_monthly["date"] = df_monthly.index
df_monthly["wy_date"] = df_monthly["date"].apply(lambda x: cal_to_wy(x))
df_monthly.index = pd.to_datetime(df_monthly["wy_date"])
df_monthly.index = df_monthly["date"]
df_monthly["month"] = df_monthly["date"].apply(
lambda x: calendar.month_abbr[x.month]
)
df_monthly["wy"] = df_monthly["wy_date"].apply(lambda x: x.year)
df_monthly.sort_values(by="wy_date", axis=0, inplace=True)
df_monthly.drop(
columns=[i for i in df_monthly.columns if "date" in i], inplace=True
)
# df_monthly.index = df_monthly['month']
df_merge = pd.merge(df_monthly,df_monthly_prec,how='inner', left_index=True, right_index=True)
#---Subplots---------------------------------------------------------------
fig, ax = plt.subplots(figsize=(8,4))
ax.plot(df_merge.index, df_merge["Median ('91-'20)"], color="green", linewidth="1", linestyle="dashed", label = 'Median Snowpack')
ax.plot(df_merge.index, df_merge[f'{curr_wy}'], color='red', linewidth='2',label='WY Current')
#------Seting x-axis range to expand bar width for ax2
ax.bar(df_merge.index,df_merge[basin[i]], color = 'blue', label = 'Monthly %')
#n = n + 1
#--format chart
ax.set_title(chart_name[w], fontweight = 'bold')
w = w + 1
ax.set_ylabel("Basin Precipitation Index")
ax.set_yticklabels([])
ax.margins(x=0)
ax.legend()
#plt.xlim(0,9)
#---Setting date format
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
#---EXPORT
plt.show()
End result desired: Plotting both the monthly dataframe (df_monthly_prec) with the daily dataframe charting only monthly values (df_monthly). The bars for the monthly DataFrame should ideally span the whole month on the chart.
I have tried creating a secondary axis, but had trouble aligning the times for the primary and secondary axes. Ideally, I would like to replace plotting df_monthly with df (showing all daily data instead of just the end-of-month values within the daily dataset).
Any assistance or pointers would be much appreciated! Apologies if additional clarification is needed.
I'm really struggling to achieve what feels like an incredibly basic geom_bar plot. I would like the sum of y to be represented by one solid bar (with colour = black outline) in bins of 10 for x. I know that stat = "identity" is what is creating the unnecessary individual blocks in each bar but can't find an alternative to achieving what is so close to my end goal. I cheated and made the below desired plot in illustrator.
I don't really want to code x as a factor for the bins as I want to keep the format of the axis ticks and text rather than having text as "0 -10", "10 -20" etc. Is there a way to do this in ggplot without the need to use summerise or cut functions on the raw data? I am also aware of geom_col and sat_count options but again, can't achive my desired outcome.
DF as below, where y = counts at various values of a continuous variable x. Also a factor variable of type.
y = c(1 ,1, 3, 2, 1, 1, 2, 1, 1, 1, 1, 1, 4, 1, 1,1, 2, 1, 2, 3, 2, 2, 1)
x = c(26.7, 28.5, 30.0, 34.8, 35.0, 36.4, 38.6, 40.0, 42.1, 43.7, 44.1, 45.0, 45.5, 47.4, 48.0, 57.2, 57.8, 64.2, 65.0, 66.7, 68.0, 74.4, 94.1)
type = c(rep("Type 1", 20), "Type 2", rep("Type 1", 2))
df<-data.frame(x,y,type)
Bar plot of total y count for each bin of x - trying to fill by total of type, but getting individual proportions as shown by line colour = black. Would like total for each type in each bar.
ggplot(df,aes(y=y, x=x))+
geom_bar(stat = "identity",color = "black", aes(fill = type))+
scale_x_binned(limits = c(20,100))+
scale_y_continuous(expand = c(0, 0), breaks = seq(0,10,2)) +
xlab("")+
ylab("Total Count")
Or trying to just have the total count within each bin but don't want the internal lines in the bars, just the outer colour = black for each bar
ggplot(df,aes(y=y, x=x))+
geom_col(fill = "#00C3C6", color = "black")+
scale_x_binned(limits = c(20,100))+
scale_y_continuous(expand = c(0, 0), breaks = seq(0,10,2)) +
xlab("")+
ylab("Total Count")
Here is one way to do it, with previous data transformation and geom_col:
df <- df |>
mutate(bins = floor(x/10) * 10) |>
group_by(bins, type) |>
summarise(y = sum(y))
ggplot(data = df,
aes(y = y,
x = bins))+
geom_col(aes(fill = type),
color = "black")+
scale_x_continuous(breaks = seq(0,100,10)) +
scale_y_continuous(expand = c(0, 0),
breaks = seq(0,10,2)) +
xlab("")+
ylab("Total Count")
I am trying to have 3 histograms on same plot but against percentage on the y axis but not sure how.
You can see what I am trying to do, but I can't get there! The bars don't scale property and not sure how to do that! Please help
data_1 = [1,1,2,3,4,4,5,6,7,7,7,8,9]
data_2 = [4,2,3,4,1,1,6,8,9,9,9,5,6]
data_3 = [1,2,3,4,5,6,7,8,9,9,4,3,7,1,2,3,2,5,7,3,4,7,3,8,2,3,4,7,2]
bin = [1,10,1]
bin[1] += bin[2]*2
bin[0] -= bin[2]
bins_list = np.arange(bin[0],bin[1],bin[2])
fig, ax = plt.subplots(figsize=(15, 10))
counts, bins, patches = plt.hist((np.clip(data_1 , bins_list[0], bins_list[-1]),np.clip(data_2, bins_list[0], bins_list[-1]),np.clip(data_3, bins_list[0], bins_list[-1])),
rwidth=0.8,
bins=bins_list, color=('blue', 'red', 'green'),
weights=None)
xlabels = bins[1:].astype(str)
xlabels[-1] = xlabels[-2] + '>'
xlabels[0] = xlabels[0] + '<'
for lb in range(1, len(xlabels)-1):
xlabels[lb] = str(bins_list[lb])+'-'+xlabels[lb]
N_labels = len(xlabels)
plt.xticks(bin[2] * np.arange(N_labels)+bin[2]/2 + bin[0], rotation=300)
ax.set_xticklabels(xlabels)
total = 0
''' Add percentages and value for each bar '''
for c in range(len(patches)):
for count, p in zip(counts[c], patches[c]):
percentage = '%0.2f%% ' % (100 * float(count) / counts[c].sum())
total += 100 * float(count) / counts[c].sum()
x1 = p.get_x()
y1 = p.get_y() + p.get_height()
ax.annotate(percentage, (x1, y1), rotation=270, fontsize = 10)
ax.yaxis.set_major_formatter(tkr.PercentFormatter(xmax=len(data_3)))
ax.grid(axis='y', color='r', linestyle=':')
ax.set_title("Please help")
ax.set_axisbelow(True)
fig.tight_layout()
Plot Result of code above
You can pass density=True on plt.hist:
counts, bins, patches = plt.hist((np.clip(data_1 , bins_list[0], bins_list[-1]),
np.clip(data_2, bins_list[0], bins_list[-1]),
np.clip(data_3, bins_list[0], bins_list[-1])),
rwidth=0.8,
density=True, # here
bins=bins_list, color=('blue', 'red', 'green'),
weights=None)
Output:
I am trying to plot how different predictors associate with stroke and underlying phenotypes (i.e. cholesterol). In my data, I originally had working ggplot code in which shapes denoted the different variables (stroke, HDL cholesterol and total cholesterol) and colour denoted type (i.e. disease (stroke) or phenotype (HDL/total cholesterol). To make it more intuitive, I want to swap shape and colour around but now that I do this, I am having issues with position dodge and the alignment of geom_point and geom_error
stroke_graph <- ggplot(stroke,aes(y=as.numeric(stroke$test),
x=Clock,
shape = Type,
colour = Variable)) +
geom_point(data=stroke, aes(shape=Type, colour=Variable), show.legend=TRUE,
position=position_dodge(width=0.5), size = 3) +
geom_errorbar(aes(ymin = as.numeric(stroke$LCI), ymax= as.numeric(stroke$UCI)),
position = position_dodge(0.5), width = 0.05,
colour ="black")+
ylab("standardised beta/log odds")+ xlab ("")+
geom_hline(yintercept = 0, linetype = "dotted")+
theme(axis.text.x = element_text(size = 10, vjust = 0.5), legend.position = "none",
plot.title = element_text(size = 12))+
scale_y_continuous(limit = c(-0.402, 0.7))+ scale_shape_manual(values=c(15, 17, 18))+
theme(legend.position="right") + labs(shape = "Variable") + guides(shape = guide_legend(reverse=TRUE)) +
coord_flip()
stroke_graph + ggtitle("Stroke and Associated Phenotypes") + theme(plot.title = element_text(hjust = 0.5))
Graph now: 1
Previously working graph - only difference in code is swapping "Type" and "Variable": 2
A project I previously submitted for a course worked as expected. I went back to run the code again and now get an python traceback error message that didn't occur before:
'matplotlib.pyplot' has no attribute 'autofmt_xdate'
I loaded the weather station data files and ran all the code, which previously worked. Below is the code for the visualization plot:
plt.figure()
plt.plot(minmaxdf.loc[:,'Month-Day'], minmaxdf.loc[:,'min_tmps'] ,'-', c = 'cyan', linewidth=0.5, label = '10yr record lows')
plt.plot(minmaxdf.loc[:,'Month-Day'], minmaxdf.loc[:,'max_tmps'] , '-', c = 'orange', linewidth=0.5, label = '10yr record highs')
plt.gca().fill_between(range(len(minmaxdf.loc[:,'min_tmps'])), minmaxdf['min_tmps'], minmaxdf['max_tmps'], facecolor = (0.5, 0.5, 0.5), alpha = 0.5)
plt.scatter(minbreach15.loc[:,'Month-Day'], minbreach15.loc[:,'min_tmps_breach15'], s = 10, c = 'blue', label = 'Record low broken - 2015')
plt.scatter(maxbreach15.loc[:,'Month-Day'], maxbreach15.loc[:,'max_tmps_breach15'], s = 10, c = 'red', label = 'Record high broken - 2015')
plt.xlabel('Month')
plt.ylabel('Temperature (Tenths of Degrees C)')
plt.title('10yr Max/Min Temperature Range for Wilton CT 06897')
plt.gca().axis([0, 400, -500, 500])
plt.xticks(range(0, len(minmaxdf.loc[:,'Month-Day']), 30), minmaxdf.loc[:,'Month-Day'].index[range(0, len(minmaxdf.loc[:,'Month-Day']), 30)], rotation = '-45')
plt.xticks( np.linspace(0, 15 + 30*11 , num = 12), (r'Jan', r'Feb', r'Mar', r'Apr', r'May', r'Jun', r'Jul', r'Aug', r'Sep', r'Oct', r'Nov', r'Dec') )
plt.legend(loc = 4, frameon = False)
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.autofmt_xdate()
plt.show()
produced a chart of day of year (2004-14) 10yr average temp max/mins, overlay with scatter points of 2015 max/mins that exceeded the averages.
autofmt_xdate() is a method of the Figure. The command hence needs to be
plt.gcf().autofmt_xdate()