scale_fill_manual is only applying to the legend and not bars - ggplot2

I have a data frame (df) that looks like the following:
gene
p_value
p_value_dif
p-value category
a
0.06
0.01
non-sig
c
0.07
0.02
non-sig
d
0.008
- 0.03
sig
e
0.009
- 0.04
sig
I have created a diverging bar graph with the following code:
ggplot(df, aes(x=gene,
y=p_value_dif ,
label=p_value_dif )) +
geom_bar(stat='identity',
aes(fill= (as.factor(p_value_dif)),
width=0.9) +
scale_fill_manual("legend",
values = c("Significant" = "black", "Insignificant" = "orange"))+
coord_flip()
The issue is that only my legend changes colors to black and orange. The bars remain grey. What can I do so that the colors in my legend match the colors of my bars?
Note: if "fill" is not set to "as.factor" I get the following:
Error: Continuous value supplied to discrete scale

The issue is that your column p_value_dif does not contain any values "Significant" or "Insignificant". Only these values will be filled "black" or "orange". All other values will be filled with the default na.value of the scale. Instead you could map your column p-value.category on fill and set your fill colors and labels like so:
library(ggplot2)
ggplot(df, aes(x=gene,
y=p_value_dif ,
label=p_value_dif )) +
geom_bar(stat='identity',
aes(fill= `p-value.category`),
width=0.9) +
scale_fill_manual("legend",
values = c("sig" = "black", "non-sig" = "orange"),
labels = c("sig" = "Significant", "non-sig" = "Insignificant"))+
coord_flip()
DATA
df <- data.frame(
stringsAsFactors = FALSE,
check.names = FALSE,
gene = c("a", "c", "d", "e"),
p_value = c(0.06, 0.07, 0.008, 0.009),
p_value_dif = c("0.01", "0.02", "- 0.03", "- 0.04"),
`p-value.category` = c("non-sig", "non-sig", "sig", "sig")
)

Related

How to expand bars over the month on the x-axis while being the same width?

for i in range(len(basin)):
prefix = "URL here"
state = "OR"
basin_name = basin[i]
df_orig = pd.read_csv(f"{prefix}/{basin_name}.csv", index_col=0)
#---create date x-index
curr_wy_date_rng = pd.date_range(
start=dt(curr_wy-1, 10, 1),
end=dt(curr_wy, 9, 30),
freq="D",
)
if not calendar.isleap(curr_wy):
print("dropping leap day")
df_orig.drop(["02-29"], inplace=True)
use_cols = ["Median ('91-'20)", f"{curr_wy}"]
df = pd.DataFrame(data=df_orig[use_cols].copy())
df.index = curr_wy_date_rng
#--create EOM percent of median values-------------------------------------
curr_wy_month_rng = pd.date_range(
start=dt(curr_wy-1, 10, 1),
end=dt(curr_wy, 6, 30),
freq="M",
)
df_monthly_prec = pd.DataFrame(data=df_monthly_basin[basin[i]].copy())
df_monthly_prec.index = curr_wy_month_rng
df_monthly = df.groupby(pd.Grouper(freq="M")).max()
df_monthly["date"] = df_monthly.index
df_monthly["wy_date"] = df_monthly["date"].apply(lambda x: cal_to_wy(x))
df_monthly.index = pd.to_datetime(df_monthly["wy_date"])
df_monthly.index = df_monthly["date"]
df_monthly["month"] = df_monthly["date"].apply(
lambda x: calendar.month_abbr[x.month]
)
df_monthly["wy"] = df_monthly["wy_date"].apply(lambda x: x.year)
df_monthly.sort_values(by="wy_date", axis=0, inplace=True)
df_monthly.drop(
columns=[i for i in df_monthly.columns if "date" in i], inplace=True
)
# df_monthly.index = df_monthly['month']
df_merge = pd.merge(df_monthly,df_monthly_prec,how='inner', left_index=True, right_index=True)
#---Subplots---------------------------------------------------------------
fig, ax = plt.subplots(figsize=(8,4))
ax.plot(df_merge.index, df_merge["Median ('91-'20)"], color="green", linewidth="1", linestyle="dashed", label = 'Median Snowpack')
ax.plot(df_merge.index, df_merge[f'{curr_wy}'], color='red', linewidth='2',label='WY Current')
#------Seting x-axis range to expand bar width for ax2
ax.bar(df_merge.index,df_merge[basin[i]], color = 'blue', label = 'Monthly %')
#n = n + 1
#--format chart
ax.set_title(chart_name[w], fontweight = 'bold')
w = w + 1
ax.set_ylabel("Basin Precipitation Index")
ax.set_yticklabels([])
ax.margins(x=0)
ax.legend()
#plt.xlim(0,9)
#---Setting date format
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
#---EXPORT
plt.show()
End result desired: Plotting both the monthly dataframe (df_monthly_prec) with the daily dataframe charting only monthly values (df_monthly). The bars for the monthly DataFrame should ideally span the whole month on the chart.
I have tried creating a secondary axis, but had trouble aligning the times for the primary and secondary axes. Ideally, I would like to replace plotting df_monthly with df (showing all daily data instead of just the end-of-month values within the daily dataset).
Any assistance or pointers would be much appreciated! Apologies if additional clarification is needed.

Stacked Bar Chart Labels-- Using geom_text to label % on a value based y-axis

I am looking to create a stacked bar chart where my y-axis measures the value but the table shows the % of total bar.
I think I need to add a pct column to my table then use that but am not sure how to get the pct column either.
Df for example is:
date, type, value, pct
Jan 1, A, 5, 45% (5/11)
Jan 1, B, 6, 55% (6/11)
table and chart image
Maybe something like this?
library(dplyr)
library(ggplot2)
test.df <- data.frame(date = c("2020-01-01", "2020-01-01", "2020-01-02", "2020-01-02"),
type = c("A", "B", "A", "B"),
val = c(5:6, 1, 7))
test.df <- test.df %>%
group_by(date) %>%
mutate(
type.num = as.numeric(type),
prop = val/sum(val),
y_text_pos = ifelse(type=="B", val, sum(val))) %>%
ungroup()
ggplot(data = test.df, aes(x = as.Date(date), y = val, fill = type)) +
geom_col() +
geom_text(aes(y = y_text_pos, label = paste0(round(prop*100,1), "%")), color = "black", vjust = 1.1)
With the output:

Column colors not matching custom scale

I've put together a barplot that gives counts for phenotypes in the maxilla and mandible. I'd like the maxilla bars to be in black, and the mandible in grey. I've created a scale color variable for the fill option, yet not luck, the colors are still out of order.
I've already used this code successfully on a number of other barplots, but no luck here. I believe it is because I have combined 3 data frames into one using rbind, however the structure of the combined data frame is no different from the uncombined one, which do work.
The first four bars should be black, the last four bars should be grey.
### 3 data sets
a<-data.frame(
row.names = c("RI2_MAX_E1","ri2_mand_E1","rc1_mand_E1"),
count = c(2,2,2),
labels = c("RI2", "ri2", "rc1")
)
b<-data.frame(
row.names = c("RP3_MAX_E1","RP4_MAX_E1","rp3_mand_E1"),
count = c(3,3,2),
labels = c("RP3", "RP4", "rp3")
)
c<-data.frame(
row.names = c("RM3_MAX_E1","rm3_mand_E1"),
count = c(5,6),
labels = c("RM3", "rm3")
)
### Bind datasets into 1
E1.bind<-rbind(a,b,c)
### order variables
E1.bind$labels<-factor(E1.bind$labels, levels =c("RI2","RP3","RP4","RM3","ri2","rc1","rp3","rm3"))
### Custom scale color
E1.color<-c("black","black","black","black","grey","grey","grey","grey")
### plot
ggplot(data= E1.bind, aes(x=E1.bind$labels, y=E1.bind$count,fill=E1.color)) +
geom_bar(stat="identity") +
xlab("Teeth") + ylab("Phenotype counts") +
ggtitle("Teeth With Greatest Number of Phenotypes - Element 1")+
scale_fill_manual(name="",
labels = c("Maxilla","Mandible"),
values = c("black","grey"))+
scale_x_discrete(labels = c("RI2","RP3","RP4","RM3","ri2","rc1","rp3","rm3")) +
scale_y_continuous(breaks = seq(0,6,1)) +
theme_classic()+
theme(legend.position="top")+
theme(plot.title = element_text(hjust = 0.5))
Your E1.bind looks like this:
E1.bind
count labels
RI2_MAX_E1 2 RI2
ri2_mand_E1 2 ri2
rc1_mand_E1 2 rc1
RP3_MAX_E1 3 RP3
RP4_MAX_E1 3 RP4
rp3_mand_E1 2 rp3
RM3_MAX_E1 5 RM3
rm3_mand_E1 6 rm3
Note the order of the labels. Then you are using this as fill:
E1.color<-c("black","black","black","black","grey","grey","grey","grey")
A better way is to add a Type to your dataframe that you use to define the fill color. That way it's also more scalable:
library(dplyr)
library(ggplot2)
a<-data.frame(
row.names = c("RI2_MAX_E1","ri2_mand_E1","rc1_mand_E1"),
count = c(2,2,2),
labels = c("RI2", "ri2", "rc1")
)
b<-data.frame(
row.names = c("RP3_MAX_E1","RP4_MAX_E1","rp3_mand_E1"),
count = c(3,3,2),
labels = c("RP3", "RP4", "rp3")
)
c<-data.frame(
row.names = c("RM3_MAX_E1","rm3_mand_E1"),
count = c(5,6),
labels = c("RM3", "rm3")
)
### Bind datasets into 1
E1.bind<-rbind(a,b,c)
E1.bind$Type <- ifelse(grepl('R.*', E1.bind$labels), "Maxilla", "Mandible")
### Sort by Type
E1.bind <- arrange(E1.bind, desc(Type))
### plot
ggplot(data= E1.bind, aes(x=labels, y=count, fill=Type)) +
geom_bar(stat="identity") +
xlab("Teeth") + ylab("Phenotype counts") +
ggtitle("Teeth With Greatest Number of Phenotypes - Element 1") +
scale_y_continuous(breaks = seq(0,6,1)) +
scale_x_discrete(limits=E1.bind$labels) +
scale_fill_manual(values = c("grey", "black")) +
theme_classic()+
theme(legend.position="top")+
theme(plot.title = element_text(hjust = 0.5))
This results in:

Problem with alignment of geom_point and geom_errorbar

I am trying to plot how different predictors associate with stroke and underlying phenotypes (i.e. cholesterol). In my data, I originally had working ggplot code in which shapes denoted the different variables (stroke, HDL cholesterol and total cholesterol) and colour denoted type (i.e. disease (stroke) or phenotype (HDL/total cholesterol). To make it more intuitive, I want to swap shape and colour around but now that I do this, I am having issues with position dodge and the alignment of geom_point and geom_error
stroke_graph <- ggplot(stroke,aes(y=as.numeric(stroke$test),
x=Clock,
shape = Type,
colour = Variable)) +
geom_point(data=stroke, aes(shape=Type, colour=Variable), show.legend=TRUE,
position=position_dodge(width=0.5), size = 3) +
geom_errorbar(aes(ymin = as.numeric(stroke$LCI), ymax= as.numeric(stroke$UCI)),
position = position_dodge(0.5), width = 0.05,
colour ="black")+
ylab("standardised beta/log odds")+ xlab ("")+
geom_hline(yintercept = 0, linetype = "dotted")+
theme(axis.text.x = element_text(size = 10, vjust = 0.5), legend.position = "none",
plot.title = element_text(size = 12))+
scale_y_continuous(limit = c(-0.402, 0.7))+ scale_shape_manual(values=c(15, 17, 18))+
theme(legend.position="right") + labs(shape = "Variable") + guides(shape = guide_legend(reverse=TRUE)) +
coord_flip()
stroke_graph + ggtitle("Stroke and Associated Phenotypes") + theme(plot.title = element_text(hjust = 0.5))
Graph now: 1
Previously working graph - only difference in code is swapping "Type" and "Variable": 2

ggplot2: How to move y axis labels right next to the bars

I am working with following reproducible dataset:
df<- data.frame(name=c(letters[1:10],letters[1:10]),fc=runif(20,-5,5)
,fdr=runif(20),group=c(rep("gene",10),rep("protein",10)))
Code used to plot:
df$sig<- ifelse(df$fdr<0.05 & df$fdr>0 ,"*","")
ggplot(df, aes(x=reorder(name,fc),fc))+geom_col(aes(fill=group),position = "dodge",width = 0.9)+
coord_flip()+
geom_text(aes(label = sig),angle = 90, position = position_stack(vjust = -0.2), color= "black",size=3)+
scale_y_continuous(position = "right")+
scale_fill_manual(values = c("gene"= "#FF002B","protein"="blue"))+
geom_hline(yintercept = 0, colour = "gray" )+
theme(legend.position="none", axis.title.y=element_blank(),
axis.title.x=element_blank(),
axis.text.y=element_text(),
axis.line=element_line(color="gray"),axis.line.y=element_blank(),
axis.ticks.y=element_blank(),
panel.background=element_blank(),panel.border=element_blank(),panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),plot.background=element_blank())
Resulting in following plot:
Instead of having the y-axis labels on left side, I would like to place them right next to the bars. I want to emulate this chart published in nature:
https://www.nature.com/articles/ncomms2112/figures/3
Like this?
df<- data.frame(name=c(letters[1:10],letters[1:10]),fc=runif(20,-5,5)
,fdr=runif(20),group=c(rep("gene",10),rep("protein",10)))
df$sig<- ifelse(df$fdr<0.05 & df$fdr>0 ,"*","")
df$try<-c(1:10,1:10) #assign numbers to letters
x_pos<-ifelse(df$group=='gene',df$try-.2,df$try+.2) #align letters over bars
y_posneg<-ifelse(df$fc>0,df$fc+.5,df$fc-.5) #set up y axis position of letters
ggplot(df, aes(x=try,fc))+geom_col(aes(fill=group),position = "dodge",width = 0.9)+
coord_flip()+
geom_text(aes(y=y_posneg,x=x_pos,label = name),color= "black",size=6)+
scale_y_continuous(position = "right")+
scale_fill_manual(values = c("gene"= "#FF002B","protein"="blue"))+
geom_hline(yintercept = 0, colour = "gray" )+
theme(legend.position="none", axis.title.y=element_blank(),
axis.title.x=element_blank(),
axis.text.y=element_blank(),
axis.line=element_line(color="gray"),axis.line.y=element_blank(),
axis.ticks.y=element_blank(),
panel.background=element_blank(),panel.border=element_blank(),panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),plot.background=element_blank())
Gives:
Or perhaps this?
x_pos<-ifelse(df$group=='gene',df$try-.2,df$try+.2) #align letters over bars
y_pos<-ifelse(df$fc>0,-.2,.2) #set up y axis position of letters
ggplot(df, aes(x=try,fc))+geom_col(aes(fill=group),position = "dodge",width = 0.9)+
coord_flip()+
geom_text(aes(y=y_pos,x=x_pos,label = name),color= "black",size=3)+
scale_y_continuous(position = "right")+
scale_fill_manual(values = c("gene"= "#FF002B","protein"="blue"))+
geom_hline(yintercept = 0, colour = "gray" )+
theme(legend.position="none", axis.title.y=element_blank(),
axis.title.x=element_blank(),
axis.text.y=element_blank(),
axis.line=element_line(color="gray"),axis.line.y=element_blank(),
axis.ticks.y=element_blank(),
panel.background=element_blank(),panel.border=element_blank(),panel.grid.major=element_blank(),
panel.grid.minor=element_blank(),plot.background=element_blank())
Gives: