geom_bar for total counts of binned continuous variable - ggplot2

I'm really struggling to achieve what feels like an incredibly basic geom_bar plot. I would like the sum of y to be represented by one solid bar (with colour = black outline) in bins of 10 for x. I know that stat = "identity" is what is creating the unnecessary individual blocks in each bar but can't find an alternative to achieving what is so close to my end goal. I cheated and made the below desired plot in illustrator.
I don't really want to code x as a factor for the bins as I want to keep the format of the axis ticks and text rather than having text as "0 -10", "10 -20" etc. Is there a way to do this in ggplot without the need to use summerise or cut functions on the raw data? I am also aware of geom_col and sat_count options but again, can't achive my desired outcome.
DF as below, where y = counts at various values of a continuous variable x. Also a factor variable of type.
y = c(1 ,1, 3, 2, 1, 1, 2, 1, 1, 1, 1, 1, 4, 1, 1,1, 2, 1, 2, 3, 2, 2, 1)
x = c(26.7, 28.5, 30.0, 34.8, 35.0, 36.4, 38.6, 40.0, 42.1, 43.7, 44.1, 45.0, 45.5, 47.4, 48.0, 57.2, 57.8, 64.2, 65.0, 66.7, 68.0, 74.4, 94.1)
type = c(rep("Type 1", 20), "Type 2", rep("Type 1", 2))
df<-data.frame(x,y,type)
Bar plot of total y count for each bin of x - trying to fill by total of type, but getting individual proportions as shown by line colour = black. Would like total for each type in each bar.
ggplot(df,aes(y=y, x=x))+
geom_bar(stat = "identity",color = "black", aes(fill = type))+
scale_x_binned(limits = c(20,100))+
scale_y_continuous(expand = c(0, 0), breaks = seq(0,10,2)) +
xlab("")+
ylab("Total Count")
Or trying to just have the total count within each bin but don't want the internal lines in the bars, just the outer colour = black for each bar
ggplot(df,aes(y=y, x=x))+
geom_col(fill = "#00C3C6", color = "black")+
scale_x_binned(limits = c(20,100))+
scale_y_continuous(expand = c(0, 0), breaks = seq(0,10,2)) +
xlab("")+
ylab("Total Count")

Here is one way to do it, with previous data transformation and geom_col:
df <- df |>
mutate(bins = floor(x/10) * 10) |>
group_by(bins, type) |>
summarise(y = sum(y))
ggplot(data = df,
aes(y = y,
x = bins))+
geom_col(aes(fill = type),
color = "black")+
scale_x_continuous(breaks = seq(0,100,10)) +
scale_y_continuous(expand = c(0, 0),
breaks = seq(0,10,2)) +
xlab("")+
ylab("Total Count")

Related

Adjust binwidth size for faceted dotplot with free y axis

I would like to adjust the binwidth of a faceted geom_dotplot while keeping the dot sizes the same.
Using the default binwidth (1/30 of the data range), I get the following plot:
library(ggplot2)
df = data.frame(
t = c(1, 1, 1, 1, 1, 2, 2, 2, 2, 2),
x = 1,
y = c(1, 2, 3, 4, 5, 100, 200, 300, 400, 500)
)
ggplot(df, aes(x=x, y=y)) +
geom_dotplot(binaxis="y", stackdir="center") +
facet_wrap(~t, scales="free_y")
However, if I change the binwidth value, the new value is taken as an absolute value (and not the ratio of the data range), so the two facets get differently sized dots:
geom_dotplot(binaxis="y", stackdir="center", binwidth=2) +
Is there a way to adjust binwidth so it is relative to its facet's data range?
One option to achieve your desired result would be via multiple geom_dotplots which allows to set the binwidth for each facet separately. This however requires some manual work to compute the binwidths so that the dots are the same size for each facet:
library(ggplot2)
y_ranges <- tapply(df$y, factor(df$t), function(x) diff(range(x)))
binwidth1 <- 2
scale2 <- binwidth1 / (y_ranges[[1]] / 30)
binwidth2 <- scale2 * y_ranges[[2]] / 30
ggplot(df, aes(x=x, y=y)) +
geom_dotplot(data = ~subset(.x, t == 1), binaxis="y", stackdir="center", binwidth = binwidth1) +
geom_dotplot(data = ~subset(.x, t == 2), binaxis="y", stackdir="center", binwidth = binwidth2) +
facet_wrap(~t, scales="free_y")

Stack bars with percentages and values shown

Here is my dataframe - data_long1
data.frame(
value = c(88, 22, 100, 12, 55, 17, 10, 2, 2),
Subtype = as.factor(c("lung","prostate",
"oesophagus","lung","prostate","oesophagus","lung",
"prostate","oesophagus")),
variable = as.factor(c("alive","alive",
"alive","dead","dead","dead","uncertain","uncertain",
"uncertain"))
)
The following code gives me a nice graph that I want, with all the values displayed, but none in percentages.
ggplot(data_long1, aes(x = Subtype, y = value, fill = variable)) + geom_bar(stat = "identity") +
geom_text(aes(label= value), size = 3, hjust = 0.1, vjust = 2, position = "stack")
What I am looking for is a stacked bar chart with The actual values displayed on the Y Axis not percentages(like previous graph) BUT also a percentage figure displayed on each subsection of the actual Bar Chart. I try this code and get a meaningless graph with every stack being 33.3%.
data_long1 %>% count(Subtype, variable) %>% group_by(Subtype) %>% mutate(pct= prop.table(n) * 100) %>% ggplot() + aes(x = Subtype, y = variable, fill=variable) +
geom_bar(stat="identity") + ylab("Number of Patients") +
geom_text(aes(label=paste0(sprintf("%1.1f", pct),"%")), position=position_stack(vjust=0.5)) + ggtitle("My Tumour Sites") + theme_bw()
I cannot seem to find a way to use the mutate function to resolve this problem. Please help.
I would pre-compute the summaries you want. Here is the proportion within each subtype:
data_long2 <- data_long1 %>%
group_by(Subtype) %>%
mutate(proportion = value / sum(value))
ggplot(data_long2, aes(x = Subtype, y = value, fill = variable)) +
geom_bar(stat = "identity") +
geom_text(aes(label= sprintf('%0.0f%%', proportion * 100)), size = 3, hjust = 0.1, vjust = 2, position = "stack")
You can also get the proportion across all groups and types simply by removing the group_by statement:
data_long2 <- data_long1 %>%
mutate(proportion = value / sum(value))
ggplot(data_long2, aes(x = Subtype, y = value, fill = variable)) +
geom_bar(stat = "identity") +
geom_text(aes(label= sprintf('%0.0f%%', proportion * 100)), size = 3, hjust = 0.1, vjust = 2, position = "stack")

Stacked Bar Chart Labels-- Using geom_text to label % on a value based y-axis

I am looking to create a stacked bar chart where my y-axis measures the value but the table shows the % of total bar.
I think I need to add a pct column to my table then use that but am not sure how to get the pct column either.
Df for example is:
date, type, value, pct
Jan 1, A, 5, 45% (5/11)
Jan 1, B, 6, 55% (6/11)
table and chart image
Maybe something like this?
library(dplyr)
library(ggplot2)
test.df <- data.frame(date = c("2020-01-01", "2020-01-01", "2020-01-02", "2020-01-02"),
type = c("A", "B", "A", "B"),
val = c(5:6, 1, 7))
test.df <- test.df %>%
group_by(date) %>%
mutate(
type.num = as.numeric(type),
prop = val/sum(val),
y_text_pos = ifelse(type=="B", val, sum(val))) %>%
ungroup()
ggplot(data = test.df, aes(x = as.Date(date), y = val, fill = type)) +
geom_col() +
geom_text(aes(y = y_text_pos, label = paste0(round(prop*100,1), "%")), color = "black", vjust = 1.1)
With the output:

'matplotlib.pyplot' has no attribute 'autofmt_xdate'

A project I previously submitted for a course worked as expected. I went back to run the code again and now get an python traceback error message that didn't occur before:
'matplotlib.pyplot' has no attribute 'autofmt_xdate'
I loaded the weather station data files and ran all the code, which previously worked. Below is the code for the visualization plot:
plt.figure()
plt.plot(minmaxdf.loc[:,'Month-Day'], minmaxdf.loc[:,'min_tmps'] ,'-', c = 'cyan', linewidth=0.5, label = '10yr record lows')
plt.plot(minmaxdf.loc[:,'Month-Day'], minmaxdf.loc[:,'max_tmps'] , '-', c = 'orange', linewidth=0.5, label = '10yr record highs')
plt.gca().fill_between(range(len(minmaxdf.loc[:,'min_tmps'])), minmaxdf['min_tmps'], minmaxdf['max_tmps'], facecolor = (0.5, 0.5, 0.5), alpha = 0.5)
plt.scatter(minbreach15.loc[:,'Month-Day'], minbreach15.loc[:,'min_tmps_breach15'], s = 10, c = 'blue', label = 'Record low broken - 2015')
plt.scatter(maxbreach15.loc[:,'Month-Day'], maxbreach15.loc[:,'max_tmps_breach15'], s = 10, c = 'red', label = 'Record high broken - 2015')
plt.xlabel('Month')
plt.ylabel('Temperature (Tenths of Degrees C)')
plt.title('10yr Max/Min Temperature Range for Wilton CT 06897')
plt.gca().axis([0, 400, -500, 500])
plt.xticks(range(0, len(minmaxdf.loc[:,'Month-Day']), 30), minmaxdf.loc[:,'Month-Day'].index[range(0, len(minmaxdf.loc[:,'Month-Day']), 30)], rotation = '-45')
plt.xticks( np.linspace(0, 15 + 30*11 , num = 12), (r'Jan', r'Feb', r'Mar', r'Apr', r'May', r'Jun', r'Jul', r'Aug', r'Sep', r'Oct', r'Nov', r'Dec') )
plt.legend(loc = 4, frameon = False)
plt.gca().spines['top'].set_visible(False)
plt.gca().spines['right'].set_visible(False)
plt.autofmt_xdate()
plt.show()
produced a chart of day of year (2004-14) 10yr average temp max/mins, overlay with scatter points of 2015 max/mins that exceeded the averages.
autofmt_xdate() is a method of the Figure. The command hence needs to be
plt.gcf().autofmt_xdate()

while creating table underneath axis on a plot, is there a way to create some whitespace between the axis and the table using matplotlib?

I am trying to create table inside a plot right underneath the axis of the plot using matplotlib
I am using the plt.table function to do this
However, when i create the table, it's created right on top of the axis, hence overlaps with the axislabels
Is there a way to create the white space in between
the code looks something like this
for key, value in arrayToPlot.iteritems():
ax1 = fig.add_subplot(2, 2, 1)
if value["HorErr"]:
cdf = []
#calculate percentile stats for the value["HorErr"]
cdfArrayPointer[key]["HorErr"]["percentileStats"]=libMath.percentileForListofPercentiles( value["HorErr"], PERCENTILE, validPointsOnly = True )
# now calculate the cdf values
cdfArrayPointer[key]["HorErr"]["cdf"] = libMath.cdf( value["HorErr"], 2, 400, validPointsOnly = True)
for k, v in cdfArrayPointer[key]["HorErr"]["cdf"].iteritems():
cdf.append( v )
#plot the cdf value
ax1.plot(cdf, 'o-', label = ('HorErr for ' + str( key) ), color = getColour(key), markersize=3)
plt.title("CDF Plot of 2D-Horizontal Error", size = 8)
plt.ylabel('Percentile %', size = 7)
plt.xlabel('Horizontal Error [m]', size = 6)
plt.axis([0, 150, 0, 110])
leg = plt.legend(loc = 4)
setLegendSize( leg, 7)
# creating the table to be drawn on the axis
tableTexts["rows"].append(key)
tableTexts["rowColour"].append(getColour(key))
if (len(tableTexts["col"]) == 0):
tableTexts["col"] = tuple(cdfArrayPointer[key]["HorErr"]["percentileStats"].keys())
tableTexts["values"].append(cdfArrayPointer[key]["HorErr"]["percentileStats"].values())
the_table = plt.table(cellText=tableTexts["values"], rowLabels= tableTexts["rows"], rowColours= tableTexts["rowColour"] ,colLabels= tableTexts["col"], loc="bottom")
What about breaking your figure up using subplot?
You could have the axis in one subplot, and the table in another. (See example near bottom of page here)
I can illustrate further if you can't follow.