Overriding the legend title using user specified string instead of variable name - legend

I have a below code using ggplot2() package. I am trying to plot between the variables - 'Company Advertising' and 'Brand Revenue' of my data frame 'htmltable' , when the another variable 'Industry' is 'Luxury'; using ggplot() function. I am using another variable of my data frame 'Brand Value' as colour variable.
p<- ggplot(htmltable[htmltable$Industry='Luxury',],aes(x='CompanyAdvertising',y='BrandRevenue')
q <- p+geom_point(aes(color='BrandValue',size='BrandValue') + geom_text(label='Brand')
r <- q+xlab("Company Advertisiment in Billions")+ylab("Brand Revenue in Billions") +ggtitle("Luxury")
r+theme(plot.title=element_text(size=10,face='bold'),legend.key=element_rect(fill='light blue'))
Here, I want to change my legend title from "BrandValue" to "BrandValue in Billions". Please suggest.
I tried using labs parameter in the below statement. But it is resulting in 2 legends.
r <- q+xlab("Company Advertisiment in Billions")+ylab("Brand Revenue in Billions") +ggtitle("Luxury")+labs(colour="BrandValue in Billions")

Have you tried this ?
+labs(colour="BrandValue in Billions",
size="BrandValue in Billions")

Related

Aesthetics : fill - Warning ocours when ploting gg Boxplot

Ive written a R-chunk which should provide me a coloured ggplot boxplot. All needed templates are loaded, so is the Data.
The Data for „Healthy“ & „BodyTemperature“ is based inside the Data „Hospital“.
For Healthy there can be only 0 oder 1.
It should plott two Boxplots next to each other on the x-axis, one showing Healthy (0) the other one Unhealthy (1) compared to the BodyTemperature of the patients on y-axis.
The Boxplot should be coloured with the Template „Brewer“.
Everytime i try to run this chunk, a warning occours. Whats the solution?
colour:
colour <- brewer.pal(n = 2, name = "Set1")
colour
Warnung: minimal value for n is 3, returning requested palette with 3 different levels
[1] "#E41A1C" "#377EB8" "#4DAF4A"
R-Chunk
colour = brewer.pal(n = 2, name = "Set1")
ggplot(Hospital, aes(x = Healthy, y = BodyTemperature)) +
geom_boxplot(fill=c(colour)) +
ylab("Temperature") +
xlab("Healthy") +
ggtitle("Health compared to Temperature")
Warning ocours:
Error in `check_aesthetics()`:
! Aesthetics must be either length 1 or the same as the data (1): fill
Backtrace:
1. base (local) `<fn>`(x)
2. ggplot2:::print.ggplot(x)
4. ggplot2:::ggplot_build.ggplot(x)
5. ggplot2 (local) by_layer(function(l, d) l$compute_geom_2(d))
6. ggplot2 (local) f(l = layers[[i]], d = data[[i]])
7. l$compute_geom_2(d)
8. ggplot2 (local) f(..., self = self)
9. self$geom$use_defaults(data, self$aes_params, modifiers)
10. ggplot2 (local) f(..., self = self)
11. ggplot2:::check_aesthetics(params[aes_params], nrow(data))
Error in check_aesthetics(params[aes_params], nrow(data)) :
As you want to color your boxplots by the value of Healthy you could do so by mapping Healthy on the fill aesthetic. Also to use one of the Brewer palettes ggplot2 already offers some convenience functions which in case of the fill aes is called scale_fill_brewer. Not sure whether you want a legend but IMHO it does not make sense so I removed it via guides. Finally as you provided no data it's not clear whether your Healthy column is a numeric or a categorical variable. For this reason I wrapped in factor to make it categorical.
Using some fake random example data:
set.seed(123)
library(ggplot2)
Hospital <- data.frame(
Healthy = rep(c(0, 1), 50),
BodyTemperature = runif(100)
)
ggplot(Hospital, aes(x = factor(Healthy), y = BodyTemperature)) +
geom_boxplot(aes(fill = factor(Healthy))) +
scale_fill_brewer(palette = "Set1") +
ylab("Temperature") +
xlab("Healthy") +
ggtitle("Health compared to Temperature") +
guides(fill = "none")

R Apply across a DF column with a dynamic column selection

Been trying to figure out how to calculate across a DF using L/T/Apply by dynamically selecting the column to use. I have a DF of wind speed at different heights (Date, Year, Wind.100, Wind.120, Wind.80, etc) and I want to do calculations based on the heights that varies depending on what turbine I am simulating.
Height <- 100
Level <- paste0("Wind.", Height)
I've tried:
tapply(df[[Level]], list(DF$Hour, DF$Year), mean)
tapply(df[Level], list(DF$Hour, DF$Year), mean)
tapply(df[,..Level], list(DF$Hour, DF$Year), mean)
but it fails and says :object 'Level' not found.
There has got to be a way to script this. I know doing a
paste0("df$Wind.",Height)
doesn't work but I can't figure it out.

geom_nodelabel_repel() position for circular ggraph plot

I have a network diagram that looks like this:
I made it using ggraph and added the labels using geom_nodelabel_repel() from ggnetwork:
( ggraph_plot <- ggraph(layout) +
geom_edge_fan(aes(color = as.factor(responses), edge_width = as.factor(responses))) +
geom_node_point(aes(color = as.factor(group)), size = 10) +
geom_nodelabel_repel(aes(label = name, x=x, y=y), segment.size = 1, segment.color = "black", size = 5) +
scale_color_manual("Group", values = c("#2b83ba", "#d7191c", "#fdae61")) +
scale_edge_color_manual("Frequency of Communication", values = c("Once a week or more" = "#444444","Monthly" = "#777777",
"Once every 3 months" = "#888888", "Once a year" = "#999999"),
limits = c("Once a week or more", "Monthly", "Once every 3 months", "Once a year")) +
scale_edge_width_manual("Frequency of Communication", values = c("Once a week or more" = 3,"Monthly" = 2,
"Once every 3 months" = 1, "Once a year" = 0.25),
limits = c("Once a week or more", "Monthly", "Once every 3 months", "Once a year")) +
theme_void() +
theme(legend.text = element_text(size=16, face="bold"),
legend.title = element_text(size=16, face="bold")) )
I want to have the labels on the left side of the plot be off to the left, and the labels on the right side of the plot to be off to the right. I want to do this because the actual labels are quite long (organization names) and they get in the way of the lines in the actual plot.
How can I do this using geom_nodelabel_repel()? i've tried different combinations of box_padding and point_padding, as well as h_just and v_just but these apply to all labels and it doesn't seem like there is a way to subset or position specific points.
Apologies for not providing a reproducible example but I wasn't sure how to do this without compromising the identities of respondents from my survey.
Well, there is always the manually-intensive, yet effective method of separately adding the geom_node_label_repel function for the nodes on the "left" vs. the "right" of the plot. It's not at all elegant and probably bad coding practice, but I've done similar things myself when I can't figure out an elegant solution. It works really well when you don't have a very large dataset to begin with and if you are not planning to make the same plot over and over again. Basically, it would entail:
Identifying if there exists a property in your dataset that places points on the "left" vs. the "right". In this case, it doesn't look like it, so you would just have to create a list manually of those entries on the "left" vs. "right" of your plot.
Using separate calls to geom_node_label_repel with different nudge_x values. Use any reasonable method to subset the "left" and "right datapoints. You can create a new column in the dataset, or use formatting in-line like data = subset(your.data.frame, property %in% left.list)
For example, if you created a column called subset.side, being either "left" or "right" in your data.frame (here: your.data.frame), your calls to geom_node_label_repel might look something like:
geom_node_label_repel(
data=subset(your.data.frame, subset.side=='left'),
aes(label=name, x=x, y=y), segment.size=1, segment.color='black', size=5,
nudge_x=-10
) +
geom_node_label_repel(
data=subset(your.data.frame, subset.side=='right'),
aes(label=name, x=x, y=y), segment.size=1, segment.color='black', size=5,
nudge_x=10
) +
Alternatively, you can create a list based on the label name itself--let's say you called those lists names.left and names.right, where you can subset accordingly by swapping in as represented in the pseudo code below:
geom_node_label_repel(
data=subset(your.data.frame, name %in% names.left),...
nudge_x = -10, ...
) +
geom_node_label_repel(
data=subset(your.data.frame, name %in% names.right),...
nudge_x = 10, ...
)
To be fair, I have not worked with the node geoms before, so I am assuming here that the positioning of the labels will not affect the mapping (as it would not with other geoms).

Conditional If Statement: If value in row starts with letter in string … set another column with some corresponding value

I have the 'Field_Type' column filled with strings and I want to derive the values in the 'Units' column using an if statement.
So Units shows the desired result. Essentially I want to call out what type of activity is occurring.
I tried to do this using my code below but it won't run (please see screen shot below for error). Any help is greatly appreciated!
create_table['Units'] = pd.np.where(create_table['Field_Name'].str.startswith("W"), "MW",
pd.np.where(create_table['Field_Name'].str.contains("R"), "MVar",
pd.np.where(create_table['Field_Name'].str.contains("V"), "Per Unit")))```
ValueError: either both or neither of x and y should be given
You can write a function to define your conditionals, then use apply on the dataframe and pass the funtion
def unit_mapper(row):
if row['Field_Type'].startswith('W'):
return 'MW'
elif 'R' in row['Field_Type']:
return 'MVar'
elif 'V' in row['Field_Type']:
return 'Per Unit'
else:
return 'N/A'
And then
create_table['Units'] = create_table.apply(unit_mapper, axis=1)
In your text you talk about Field_Type but you are using Field_Name in your example. Which one is good ?
You want to do something like:
create_table[create_table['Field_Type'].str.startwith('W'), 'Units'] = 'MW'
create_table[create_table['Field_Type'].str.startwith('R'), 'Units'] = 'MVar'
create_table[create_table['Field_Type'].str.startwith('V'), 'Units'] = 'Per Unit'

One ggplot from two data frames (1 bar each)

I was looking for an answer everywhere, but I just couldn't find one to this problem (maybe I was just too stupid to use other answers, because I'm new to R).
I have two data frames with different numbers of rows. I want to create a plot containing a single bar per data frame. Both should have the same length and the count of different variables should be stacked over each other. For example: I want to compare the proportions of gender in those to data sets.
t1<-data.frame(cbind(c(1:6), factor(c(1,2,2,1,2,2))))
t2<-data.frame(cbind(c(1:4), factor(c(1,2,2,1))))
1 represents male, 2 represents female
I want to create two barplots next to each other that represent, that the proportions of gender in the first data frame is 2:4 and in the second one 2:2.
My attempt looked like this:
ggplot() + geom_bar(aes(1, t1$X2, position = "fill")) + geom_bar(aes(1, t2$X2, position = "fill"))
That leads to the error: "Error: stat_count() must not be used with a y aesthetic."
First I should merge the two dataframes. You need to add a variable that will identify the origin of the data, add in both dataframes a column with an ID (like t1 and t2). Keep in mind that your columnames are the same in both frames so you will be able to use the function rbind.
t1$data <- "t1"
t2$data <- "t2"
t <- (rbind(t1,t2))
Now you can make the plot:
ggplot(t[order(t$X2),], aes(data, X2, fill=factor(X2))) +
geom_bar(stat="identity", position="stack")