I have created a lollipop chart that I love. However, when the code runs to create the plot, the colors of the lines, segments, and points all change from what they were set to. Everything else runs great, so this isn't the end of the world, but I am trying to stick with a color palette throughout a report.
The colors should be this ("#9a0138", and "#000775" specifically):
But come out like this:
Any ideas?
Here is the data:
TabPercentCompliant <- structure(list(Provider_ShortName = c("ProviderA", "ProviderA", "ProviderA", "ProviderB",
"ProviderB", "ProviderB", "ProviderC", "ProviderC", "ProviderC", "ProviderD"), SubMeasureID = c("AMM2", "FUH7", "HDO", "AMM2", "FUH7", "HDO", "AMM2", "FUH7", "HDO", "AMM2"), AdaptedCompliant = c(139, 2, 117, 85, 1, 33, 36, 2, 22, 43), TotalEligible = c(238, 27, 155, 148, 10, 34, 61, 3, 24, 76), PercentCompliant = c(0.584033613445378, 0.0740740740740741, 0.754838709677419, 0.574324324324324, 0.1, 0.970588235294118, 0.590163934426229, 0.666666666666667, 0.916666666666667, 0.565789473684211 ), PercentTotalEligible = c(0.00516358587173479, 0.00058578495183546, 0.00336283953831467, 0.00321096936561659, 0.000216957389568689, 0.000737655124533542, 0.001323440076369, 6.50872168706066e-05, 0.000520697734964853, 0.00164887616072203), ClaimsAdjudicatedThrough = structure(c(19024, 19024, 19024, 19024, 19024, 19024, 19024, 19024, 19024, 19024 ), class = "Date"), AdaptedNCQAMean = c(0.57, 0.39, 0.93, 0.57, 0.39, 0.93, 0.57, 0.39, 0.93, 0.57), PerformanceLevel = c(0.0140336134453782, -0.315925925925926, -0.175161290322581, 0.00432432432432439, -0.29, 0.0405882352941176, 0.0201639344262295, 0.276666666666667, -0.0133333333333334, -0.00421052631578944)), row.names = c(NA, -10L), class = c("tbl_df", "tbl", "data.frame"))
VBP_Report_Date = "2022-09-01"
And the code for the plot:
Tab_PercentCompliant %>%
filter(ClaimsAdjudicatedThrough == VBP_Report_Date) %>%
ggplot(aes(x = Provider_ShortName,
y = PercentCompliant)
) +
geom_line(aes(x = Provider_ShortName,
y = AdaptedNCQAMean,
group = SubMeasureID,
color = "#9a0138",
size = .001)
) +
geom_point(aes(color = "#000775",
size = (PercentTotalEligible)
)
) +
geom_segment(aes(x = Provider_ShortName,
xend = Provider_ShortName,
y = 0,
yend = PercentCompliant,
color = "#000775")
)+
facet_grid(cols = vars(SubMeasureID),
scales = "fixed",
space = "fixed")+
theme_classic()+
theme(legend.position = "none") +
theme(panel.spacing = unit(.5, "lines"),
panel.border = element_rect(
color = "black",
fill = NA,
linewidth = .5),
panel.grid.major.y = element_line(
color = "gray",
linewidth = .5),
axis.text.x = element_text(
angle = 65,
hjust=1),
axis.title.x = element_blank(),
axis.line = element_blank(),
strip.background = element_rect(
color = NULL,
fill = "#e1e7fa"))+
scale_y_continuous(labels = scales::percent)+
labs(title = "Test",
subtitle = "Test",
caption = "Test")
If you have an aesthetic constant, it is often easier / better to have it "outside" your aes call. If you want to have a legend for your color, then you need to keep it "inside", but you will need to manually set the colors with + scale_color/fill_manual.
I've had to cut down quite a lot in your code to make it work. I've also removed bits that are extraneous to the problem. I've removed line size = 0.001 or the line wasn't visible. I've removed the weird filter step or the plot wasn't possible.
Tips: when defining a global aesthetic with ggplot(aes(x = ... etc), you don't need to specify this aesthetic in each geom layer (those aesthetics will be inherited)- makes a more concise / readable code.
library(ggplot2)
ggplot(TabPercentCompliant, aes(x = Provider_ShortName, y = PercentCompliant)) +
geom_line(aes(y = AdaptedNCQAMean, group = SubMeasureID),
color = "#9a0138") +
geom_point(aes(size = PercentTotalEligible), color = "#000775") +
geom_segment(aes(xend = Provider_ShortName, y = 0, yend = PercentCompliant),
color = "#000775") +
facet_grid(~SubMeasureID) +
theme(strip.background = element_rect(color = NULL, fill = "#e1e7fa"))
Here is the final code. Thanks again tjebo!
# Lollipop Chart ----------------------------------------------------------
Tab_PercentCompliant %>%
filter(ClaimsAdjudicatedThrough == VBP_Report_Date) %>%
ggplot(aes(x = Provider_ShortName,
y = PercentCompliant)
) +
geom_line(aes(y = AdaptedNCQAMean,
group = SubMeasureID),
color = "#9a0138"
) +
geom_point(aes(size = PercentTotalEligible),
color = "#000775",
) +
geom_segment(aes(xend = Provider_ShortName,
y = 0,
yend = PercentCompliant),
color = "#000775"
)+
facet_grid(cols = vars(SubMeasureID)
)+
theme_bw()+
theme(legend.position = "none",
axis.text.x = element_text(
angle = 65,
hjust=1),
axis.title.x = element_blank(),
axis.line = element_blank(),
strip.background = element_rect(
fill = "#e1e7fa"))+
scale_y_continuous(labels = scales::percent)+
labs(title = "Test",
subtitle = "Test",
caption = "Test")
I have two data sets, lets call them A and B (dput of the first 5 rows of each below):
`A: structure(list(Location = c(3960.82823, 3923.691, 3919.40593,
3907.97909, 3886.55377), Height = c(0.163744751, 0.231555472,
0.232150996, 0.192475738, 0.162966924), Start = c(3963.68494,
3946.54468, 3920.83429, 3909.40745, 3895.1239), End = c(3953.68645,
3920.83429, 3909.40745, 3895.1239, 3883.69706)), row.names = c(NA,
5L), class = "data.frame")
`
`B:structure(list(Wavenumber..cm.1. = c(3997.96546, 3996.5371, 3995.10875,
3993.68039, 3992.25204), M100 = c(0.00106, 0.00105, 0.00095,
0.00075, 0.00053), M101 = c(0.00081, 0.00092, 0.00102, 0.001,
0.00082), M102 = c(0.00099, 0.00109, 0.00105, 9e-04, 0.00072),
M103 = c(0.00101, 0.00111, 0.0012, 0.00129, 0.00133), M104 = c(0.00081,
0.00083, 0.00084, 0.00086, 0.00089), M105 = c(0.00139, 0.00113,
0.00092, 0.00089, 0.00102), M106 = c(0.00095, 0.00103, 0.00095,
0.00074, 0.00058), M107 = c(0.00054, 0.00058, 0.00059, 0.00049,
0.00032), M108 = c(0.00042, 5e-04, 5e-04, 0.00034, 0.00011
), M109 = c(0.00069, 0.00051, 0.00043, 0.00051, 0.00065),
M110 = c(0.00113, 0.00121, 0.00124, 0.00116, 0.00099), M111 = c(0.00039,
0.00056, 0.00068, 0.00068, 0.00056), M112 = c(0.0011, 0.00112,
0.00112, 0.00108, 0.00099), M113 = c(3e-04, 3e-04, 3e-04,
0.00027, 0.00019), M114 = c(0.00029, 6e-05, -2e-05, 9e-05,
0.00028), M115 = c(0.00091, 0.00079, 0.00061, 0.00038, 2e-04
), M116 = c(0.00117, 0.00105, 0.00096, 0.00092, 0.00092),
M117 = c(0.00039, 2e-04, 6e-05, 6e-05, 0.00018), M118 = c(0.00096,
0.00073, 0.00055, 0.00047, 0.00049), M119 = c(0.00037, 0.00031,
0.00024, 0.00018, 0.00018), M120 = c(0.00116, 0.00098, 0.00084,
0.00076, 0.00067), M121 = c(0.00039, 0.00024, 0.00011, 7e-05,
0.00011), M122 = c(0.00032, 0.00038, 0.00045, 0.00044, 0.00035
), M123 = c(9e-04, 0.00097, 0.00108, 0.0012, 0.00128), M124 = c(-0.00082,
-0.00065, -0.00049, -0.00037, -0.00036), M125 = c(0.00053,
0.00054, 0.00055, 6e-04, 0.00071), M126 = c(7e-05, 0.00022,
0.00022, 0.00011, 2e-05), M127 = c(0.00086, 9e-04, 0.00086,
0.00073, 0.00058), M128 = c(0.00089, 0.00078, 0.00069, 0.00057,
0.00043), M129 = c(0.00094, 0.00097, 0.00106, 0.00114, 0.00105
), M130 = c(0.0013, 0.00118, 0.00115, 0.00116, 0.00111),
M131 = c(0.00029, 0.00033, 0.00033, 3e-04, 0.00022), M132 = c(0,
0.00026, 0.00048, 6e-04, 0.00063), M133 = c(3e-05, -6e-05,
-6e-05, 5e-05, 0.00019), M134 = c(0.00056, 0.00054, 0.00052,
0.00054, 0.00057), M135 = c(2e-05, -4e-05, 6e-05, 0.00031,
0.00057), M136 = c(0.00083, 0.00075, 0.00068, 0.00068, 0.00073
), M137 = c(0.00064, 0.00074, 0.00084, 0.00095, 0.00105),
M139 = c(0.00044, 0.00044, 0.00042, 0.00043, 0.00047), M140 = c(0.00138,
0.00113, 0.00102, 0.0011, 0.00121), M141 = c(0.00062, 0.00043,
2e-04, 2e-05, 0), M142 = c(-0.00022, -0.00017, -0.00014,
-1e-04, 0), M143 = c(0.00109, 0.00108, 0.00103, 0.00093,
0.00087), M144 = c(0.00104, 0.00116, 0.00117, 0.00105, 0.00085
), M145 = c(7e-04, 0.00096, 0.00109, 0.00098, 0.00069), M146 = c(0.0014,
0.00158, 0.00165, 0.00154, 0.0013), M147 = c(6e-04, 0.00071,
0.00075, 0.00072, 0.00065), M148 = c(0.00098, 0.00093, 0.00091,
9e-04, 0.00088), M149 = c(0.00055, 0.00058, 0.00054, 0.00037,
0.00017), M150 = c(7e-04, 0.00068, 8e-04, 0.00107, 0.00132
), M151 = c(0.00037, 0.00042, 0.00046, 0.00047, 0.00046),
M152 = c(0.00047, 0.00042, 0.00043, 0.00045, 0.00045), M153 = c(0.00095,
0.00088, 0.00083, 8e-04, 0.00072), M154 = c(6e-05, 0.00013,
0.00032, 0.00054, 0.00062), M155 = c(0.00061, 0.00057, 0.00043,
0.00022, 4e-05), M156 = c(0.00077, 0.00078, 0.00071, 0.00052,
0.00025), M157 = c(0.00088, 0.00078, 0.00069, 0.00063, 0.00058
), M158 = c(0.00091, 0.00085, 0.00082, 0.00081, 8e-04), M159 = c(0.00078,
0.00076, 0.00073, 0.00074, 0.00079), M160 = c(0.00068, 7e-04,
0.00075, 8e-04, 0.00079), M161 = c(0.00055, 0.00073, 0.00082,
0.00085, 9e-04), M162 = c(0.00104, 0.00111, 0.0011, 0.00104,
0.00102), M163 = c(0.00076, 0.00071, 0.00069, 0.00068, 0.00067
), M164 = c(0.0012, 0.00133, 0.00154, 0.00174, 0.00177),
M165 = c(0.00072, 0.00073, 0.00072, 0.00074, 0.00083), M166 = c(0.00067,
0.00055, 0.00035, 0.00012, -2e-05), M167 = c(0.00068, 0.00053,
0.00047, 0.00051, 0.00059), M168 = c(0.00067, 0.00092, 0.001,
0.00087, 0.00067), M169 = c(0.00124, 0.00107, 0.00101, 0.00108,
0.00118), M170 = c(0.00054, 0.00064, 0.00069, 0.00066, 0.00053
), M171 = c(0.00029, 3e-04, 3e-04, 0.00031, 3e-04), M172 = c(0.00085,
0.00091, 0.00082, 0.00063, 0.00052), M173 = c(0.00022, 0.00036,
0.00053, 0.00061, 0.00056), M174 = c(5e-04, 0.00031, 0.00021,
0.00023, 0.00031), M175 = c(0.00074, 0.00066, 0.00059, 0.00051,
0.00043), M176 = c(9e-04, 0.00062, 0.00044, 0.00039, 0.00039
), M177 = c(0.00045, 0.00038, 0.00033, 0.00035, 0.00043),
M178 = c(0.00075, 0.00092, 0.00097, 0.00086, 0.00067), M179 = c(0.00047,
0.00033, 0.00026, 3e-04, 0.00037), M180 = c(0.00083, 0.00077,
0.00074, 0.00074, 7e-04), M181 = c(0.0013, 0.00138, 0.00137,
0.00127, 0.00109), M182 = c(0.00062, 0.00049, 0.00043, 0.00042,
0.00038), M183 = c(0.00056, 4e-04, 0.00034, 0.00046, 0.00065
), M184 = c(0.00122, 0.00116, 0.00096, 0.00067, 0.00039),
M185 = c(0.00045, 0.00026, 0.00012, 1e-04, 0.00024), M187 = c(0.00078,
0.00038, 8e-05, 0, 0.00014)), row.names = c(NA, 5L), class = "data.frame")
`
I want to be able to calculate the means of the M columns in data set B, based on the Start and End columns in data set A (which correspond to the Wavenumber cm-1 column in data set B). So that for each Start and End set of values you have a corresponding mean for each M column in data set B.
So for example for the Start and End values in the first row of data set A:
Start: 3963.68494 End: 3953.68645 you would calculate the mean of each M column in data set B using the absorbance values corresponding to the Wavenumber cm-1 range of 3963.6849 to 3953.68645, which would then be stored in a separate data frame (with all the M column names) called meanData or something.
I can quite figure out how to write a function/loop that would do that, going and taking the Start and End values in dataset A, looking at dataset B getting the corresponding Absorbance values that fall into that Start and End range, calculate their mean and write it into a new data frame under its corresponding M column name and repeating this for each row of Start and End Values in dataset A. I know you would likely do it with an index, but I'm not sure how to write it exactly. Any help would be very much appreciated!
I tried creating different indexes for the Start and End columns and using them to try and specify the values I want in dataset B, using [] but I was unsuccessful:
`test<-mean(B$M100[which(B$Wavenumber..cm.1.[index2[i] to B$Wavenumber..cm.1.index3[i]])`
where index2 is the Start values in dataset A and index3 is the end values in datasetA, this did not work
I've got a rough and ready function that can be used to compare two sets of values using histograms:
I want to set the individual edge colors of each of the histograms in the top plot (much as how I set the individual sets of values used for each histogram). How could this be done?
import os
import datavision
import matplotlib.pyplot
import numpy
import shijian
def main():
a = numpy.random.normal(2, 2, size = 120)
b = numpy.random.normal(2, 2, size = 120)
save_histogram_comparison_matplotlib(
values_1 = a,
values_2 = b,
label_1 = "a",
label_2 = "b",
normalize = True,
label_ratio_x = "measurement",
label_y = "",
title = "comparison of a and b",
filename = "histogram_comparison_1.png"
)
def save_histogram_comparison_matplotlib(
values_1 = None,
values_2 = None,
filename = None,
directory = ".",
number_of_bins = None,
normalize = True,
label_x = "",
label_y = None,
label_ratio_x = None,
label_ratio_y = "ratio",
title = "comparison",
label_1 = "1",
label_2 = "2",
overwrite = True,
LaTeX = False,
#aspect = None,
font_size = 20,
color_1 = "#3861AA",
color_2 = "#00FF00",
color_3 = "#7FDADC",
color_edge_1 = "#3861AA", # |<---------- insert magic for these
color_edge_2 = "#00FF00", # |
alpha = 0.5,
width_line = 1
):
matplotlib.pyplot.ioff()
if LaTeX is True:
matplotlib.pyplot.rc("text", usetex = True)
matplotlib.pyplot.rc("font", family = "serif")
if number_of_bins is None:
number_of_bins_1 = datavision.propose_number_of_bins(values_1)
number_of_bins_2 = datavision.propose_number_of_bins(values_2)
number_of_bins = int((number_of_bins_1 + number_of_bins_2) / 2)
if filename is None:
if title is None:
filename = "histogram_comparison.png"
else:
filename = shijian.propose_filename(
filename = title + ".png",
overwrite = overwrite
)
else:
filename = shijian.propose_filename(
filename = filename,
overwrite = overwrite
)
values = []
values.append(values_1)
values.append(values_2)
bar_width = 0.8
figure, (axis_1, axis_2) = matplotlib.pyplot.subplots(
nrows = 2,
gridspec_kw = {"height_ratios": (2, 1)}
)
ns, bins, patches = axis_1.hist(
values,
color = [
color_1,
color_2
],
normed = normalize,
histtype = "stepfilled",
bins = number_of_bins,
alpha = alpha,
label = [label_1, label_2],
rwidth = bar_width,
linewidth = width_line,
#edgecolor = [color_edge_1, color_edge_2] <---------- magic here? dunno
)
axis_1.legend(
loc = "best"
)
bars = axis_2.bar(
bins[:-1],
ns[0] / ns[1],
alpha = 1,
linewidth = 0, #width_line
width = bins[1] - bins[0]
)
for bar in bars:
bar.set_color(color_3)
axis_1.set_xlabel(label_x, fontsize = font_size)
axis_1.set_ylabel(label_y, fontsize = font_size)
axis_2.set_xlabel(label_ratio_x, fontsize = font_size)
axis_2.set_ylabel(label_ratio_y, fontsize = font_size)
#axis_1.xticks(fontsize = font_size)
#axis_1.yticks(fontsize = font_size)
#axis_2.xticks(fontsize = font_size)
#axis_2.yticks(fontsize = font_size)
matplotlib.pyplot.suptitle(title, fontsize = font_size)
if not os.path.exists(directory):
os.makedirs(directory)
#if aspect is None:
# matplotlib.pyplot.axes().set_aspect(
# 1 / matplotlib.pyplot.axes().get_data_ratio()
# )
#else:
# matplotlib.pyplot.axes().set_aspect(aspect)
figure.tight_layout()
matplotlib.pyplot.subplots_adjust(top = 0.9)
matplotlib.pyplot.savefig(
directory + "/" + filename,
dpi = 700
)
matplotlib.pyplot.close()
if __name__ == "__main__":
main()
You may simply plot two different histograms but share the bins.
import numpy as np; np.random.seed(3)
import matplotlib.pyplot as plt
a = np.random.normal(size=(89,2))
kws = dict(histtype= "stepfilled",alpha= 0.5, linewidth = 2)
hist, edges,_ = plt.hist(a[:,0], bins = 6,color="lightseagreen", label = "A", edgecolor="k", **kws)
plt.hist(a[:,1], bins = edges,color="gold", label = "B", edgecolor="crimson", **kws)
plt.show()
Use the lists of Patches objects returned by the hist() function.
In your case, you have two datasets, so your variable patches will be a list containing two lists, each with the Patches objects used to draw the bars on your plot.
You can easily set the properties on all of these objects using the setp() function. For example:
a = np.random.normal(size=(100,))
b = np.random.normal(size=(100,))
c,d,e = plt.hist([a,b], color=['r','g'])
plt.setp(e[0], edgecolor='k', lw=2)
plt.setp(e[1], edgecolor='b', lw=3)