Can the grc1leg command combine the legend for multiple kdensity plots in Stata? - legend

I am creating kdensity curves for income (kerr_income) distributions in a year (taxyear) by gender (male). However, when I create the kdensity plot, I cannot combine the legend to differentiate by gender and not each curve. I want one legend to show a male and female distribution only.
kdensity kerr_income if male==0 & taxyear == 2008, addplot(kdensity kerr_income if male==1 & ///
taxyear == 2008) name(g1, replace) ///
legend(label(1 "Female") label(2 "Male"))
kdensity kerr_income if male==0 & taxyear == 2009, legend(off) name(g2, replace) ///
addplot(kdensity kerr_income if male==1 & taxyear == 2009, legend(off) name(g3, replace) || ///
kdensity kerr_income if male==0 & taxyear == 2010, legend(off) name(g4, replace) || ///
kdensity kerr_income if male==1 & taxyear == 2010, legend(off) name(g5, replace))
grc1leg g1 g2 g3 g4 g5, legendfrom(g1)
I am using Stata 14. Any help would be appreciated!

I haven't tried grc1leg on this kind of problem, or indeed much at all, but the problem you're facing can be avoided altogether. You need code so that you can exploit the by() option's scope for arranging all the scaffolding and cutting down the duplicate legends.
Here is a demonstration. Note that you need multidensity from SSC.
* want a 2 x 4 grouping
sysuse auto, clear
xtile qprice=price, nq(4)
preserve
egen group = group(qprice foreign), label
tab group
* ssc install multidensity
multidensity generate mpg, by(group) min(5) max(45)
keep _x* _density*
gen id = _n
reshape long _density _x, i(id) j(g)
gen foreign = !mod(g, 2)
separate _density, by(foreign)
label var _density0 "Domestic"
label var _density1 "Foreign"
gen G = ceil(g/2)
line _density? _x, by(G, note("Quartile bins of price")) sort xla(5(10)45) xtitle(Miles per gallon) ytitle(Density) name(demo, replace)
restore

Related

ggplot: Reorder Facet rows in facet.grid

I am having issues reorganizing my rows using facet.grid. I have four groups that by default are organized alphabetically. Using the factor command I can organize the groups - but they switch to columns (horizontal) and I need rows (vertical).
Example of subset of my data
Site Date Analyte Result
SJR#Calaveres 5/26/22 Top TP 0.35
SJR#Ship Channel 5/26/22 Top TP 0.56
Turning Basin 5/26/22 Top TP 0.46
Morelli Boat Ramp 5/26/22 Top TP 0.45
Public Dock 5/26/22 Top TP 0.55
wi_daily_wq<-read.csv("nutrients_p.csv")
p <- ggplot(data = wi_daily_wq, aes(x = Date, y = Result))+
geom_point(aes(color = Site)) +
theme_bw()
p
# Add vertical facets, aka divide the plot up vertically since they share an x axis
p + facet_grid(Analyte ~ .)
# Add vertical facets, but scale only the y axes freely
p + facet_grid(Analyte ~ ., scales = "free_y")
p + facet_grid(Analyte ~ ., scales = "free_y",
switch = "y") # flip the facet labels along the y axis from the right side to the left
#Trying to organize rows in order that I want
p + facet_grid(~factor (Analyte,levels=c('Bottom TP','Top TP','Bottom PO4','Top PO4')))
ylab(NULL) + # remove the word "values"
theme(strip.background = element_blank(), # remove the background
strip.placement = "outside") # put labels to the left of the axis text
If I remove the following argument from the above code I get the row plot I like. But, the rows are just in alphabetical order, not the order I need them in.
p + facet_grid(~factor (Analyte,levels=c('Bottom TP','Top TP','Bottom PO4','Top PO4')))
I'm wondering what adjustment I need to make to my code to organize the rows in the order I need them without converting the facets to columns.

Aesthetics : fill - Warning ocours when ploting gg Boxplot

Ive written a R-chunk which should provide me a coloured ggplot boxplot. All needed templates are loaded, so is the Data.
The Data for „Healthy“ & „BodyTemperature“ is based inside the Data „Hospital“.
For Healthy there can be only 0 oder 1.
It should plott two Boxplots next to each other on the x-axis, one showing Healthy (0) the other one Unhealthy (1) compared to the BodyTemperature of the patients on y-axis.
The Boxplot should be coloured with the Template „Brewer“.
Everytime i try to run this chunk, a warning occours. Whats the solution?
colour:
colour <- brewer.pal(n = 2, name = "Set1")
colour
Warnung: minimal value for n is 3, returning requested palette with 3 different levels
[1] "#E41A1C" "#377EB8" "#4DAF4A"
R-Chunk
colour = brewer.pal(n = 2, name = "Set1")
ggplot(Hospital, aes(x = Healthy, y = BodyTemperature)) +
geom_boxplot(fill=c(colour)) +
ylab("Temperature") +
xlab("Healthy") +
ggtitle("Health compared to Temperature")
Warning ocours:
Error in `check_aesthetics()`:
! Aesthetics must be either length 1 or the same as the data (1): fill
Backtrace:
1. base (local) `<fn>`(x)
2. ggplot2:::print.ggplot(x)
4. ggplot2:::ggplot_build.ggplot(x)
5. ggplot2 (local) by_layer(function(l, d) l$compute_geom_2(d))
6. ggplot2 (local) f(l = layers[[i]], d = data[[i]])
7. l$compute_geom_2(d)
8. ggplot2 (local) f(..., self = self)
9. self$geom$use_defaults(data, self$aes_params, modifiers)
10. ggplot2 (local) f(..., self = self)
11. ggplot2:::check_aesthetics(params[aes_params], nrow(data))
Error in check_aesthetics(params[aes_params], nrow(data)) :
As you want to color your boxplots by the value of Healthy you could do so by mapping Healthy on the fill aesthetic. Also to use one of the Brewer palettes ggplot2 already offers some convenience functions which in case of the fill aes is called scale_fill_brewer. Not sure whether you want a legend but IMHO it does not make sense so I removed it via guides. Finally as you provided no data it's not clear whether your Healthy column is a numeric or a categorical variable. For this reason I wrapped in factor to make it categorical.
Using some fake random example data:
set.seed(123)
library(ggplot2)
Hospital <- data.frame(
Healthy = rep(c(0, 1), 50),
BodyTemperature = runif(100)
)
ggplot(Hospital, aes(x = factor(Healthy), y = BodyTemperature)) +
geom_boxplot(aes(fill = factor(Healthy))) +
scale_fill_brewer(palette = "Set1") +
ylab("Temperature") +
xlab("Healthy") +
ggtitle("Health compared to Temperature") +
guides(fill = "none")

geom_nodelabel_repel() position for circular ggraph plot

I have a network diagram that looks like this:
I made it using ggraph and added the labels using geom_nodelabel_repel() from ggnetwork:
( ggraph_plot <- ggraph(layout) +
geom_edge_fan(aes(color = as.factor(responses), edge_width = as.factor(responses))) +
geom_node_point(aes(color = as.factor(group)), size = 10) +
geom_nodelabel_repel(aes(label = name, x=x, y=y), segment.size = 1, segment.color = "black", size = 5) +
scale_color_manual("Group", values = c("#2b83ba", "#d7191c", "#fdae61")) +
scale_edge_color_manual("Frequency of Communication", values = c("Once a week or more" = "#444444","Monthly" = "#777777",
"Once every 3 months" = "#888888", "Once a year" = "#999999"),
limits = c("Once a week or more", "Monthly", "Once every 3 months", "Once a year")) +
scale_edge_width_manual("Frequency of Communication", values = c("Once a week or more" = 3,"Monthly" = 2,
"Once every 3 months" = 1, "Once a year" = 0.25),
limits = c("Once a week or more", "Monthly", "Once every 3 months", "Once a year")) +
theme_void() +
theme(legend.text = element_text(size=16, face="bold"),
legend.title = element_text(size=16, face="bold")) )
I want to have the labels on the left side of the plot be off to the left, and the labels on the right side of the plot to be off to the right. I want to do this because the actual labels are quite long (organization names) and they get in the way of the lines in the actual plot.
How can I do this using geom_nodelabel_repel()? i've tried different combinations of box_padding and point_padding, as well as h_just and v_just but these apply to all labels and it doesn't seem like there is a way to subset or position specific points.
Apologies for not providing a reproducible example but I wasn't sure how to do this without compromising the identities of respondents from my survey.
Well, there is always the manually-intensive, yet effective method of separately adding the geom_node_label_repel function for the nodes on the "left" vs. the "right" of the plot. It's not at all elegant and probably bad coding practice, but I've done similar things myself when I can't figure out an elegant solution. It works really well when you don't have a very large dataset to begin with and if you are not planning to make the same plot over and over again. Basically, it would entail:
Identifying if there exists a property in your dataset that places points on the "left" vs. the "right". In this case, it doesn't look like it, so you would just have to create a list manually of those entries on the "left" vs. "right" of your plot.
Using separate calls to geom_node_label_repel with different nudge_x values. Use any reasonable method to subset the "left" and "right datapoints. You can create a new column in the dataset, or use formatting in-line like data = subset(your.data.frame, property %in% left.list)
For example, if you created a column called subset.side, being either "left" or "right" in your data.frame (here: your.data.frame), your calls to geom_node_label_repel might look something like:
geom_node_label_repel(
data=subset(your.data.frame, subset.side=='left'),
aes(label=name, x=x, y=y), segment.size=1, segment.color='black', size=5,
nudge_x=-10
) +
geom_node_label_repel(
data=subset(your.data.frame, subset.side=='right'),
aes(label=name, x=x, y=y), segment.size=1, segment.color='black', size=5,
nudge_x=10
) +
Alternatively, you can create a list based on the label name itself--let's say you called those lists names.left and names.right, where you can subset accordingly by swapping in as represented in the pseudo code below:
geom_node_label_repel(
data=subset(your.data.frame, name %in% names.left),...
nudge_x = -10, ...
) +
geom_node_label_repel(
data=subset(your.data.frame, name %in% names.right),...
nudge_x = 10, ...
)
To be fair, I have not worked with the node geoms before, so I am assuming here that the positioning of the labels will not affect the mapping (as it would not with other geoms).

Storing Pivots as a variables? Pine Script

Is there any way to store let's say 5 latest pivots as variables?
There is a simple built-in indicator call Pivot H/L which finds pivots and places plot shapes near them. Is there a way to store them instead of plotting the plot shapes?
yes, you can. It's ugly code, but you can. Here, I show you as an example to do it for the current and last 2 pivot highs. You can extend as much as you want creating more variables in analogy to what is here.
If you're interested in the pivot lows, also just replicate everything to the low pivots.
The way it is shown here, you'll find:
ph0: the last found high pivot
ph1: the penultimate last found high pivot
ph2: the last found high pivot before ph1
This way, we keep (about) the same way we reference past elements of Series variables in Pine-script. Here the code:
//#version=4
study("Trend", overlay=false)
//INPUT VARIABLES
leftBars = input(3)
rightBars = input(3)
//INIT VARIABLES
var float ph_valid = 0
var float ph0 = 0
var float ph1 = 0
var float ph2 = 0
ph = pivothigh(high, leftBars, rightBars)
ph_non_na = nz(ph,0) // stores 0's instead of na's for non-pivot-pointed bars
// Assigns non-na values to pre-instantiated variables
if ph_non_na != 0
ph2 := ph1
ph1 := ph0
ph0 := ph_non_na
else
ph2 := ph2
ph1 := ph1
ph0 := ph0
plot(ph0)
plot(ph1)
plot(ph2)

Data Selection - Finding relations between dataframe attributes

let's say i have a dataframe of 80 columns and 1 target column,
for example a bank account table with 80 attributes for each record (account) and 1 target column which decides if the client stays or leaves.
what steps and algorithms should i follow to select the most effective columns with the higher impact on the target column ?
There are a number of steps you can take, I'll give some examples to get you started:
A correlation coefficient, such as Pearson's Rho (for parametric data) or Spearman's R (for ordinate data).
Feature importances. I like XGBoost for this, as it includes the handy xgb.ggplot.importance / xgb.plot_importance methods.
One of the many feature selection options, such as python's sklearn.feature_selection methods.
This one way to do it using the Pearson correlation coefficient in Rstudio, I used it once when exploring the red_wine dataset my targeted variable or column was the quality and I wanted to know the effect of the rest of the columns on it.
see below figure shows the output of the code as you can see the blue color represents positive relation and red represents negative relations and the closer the value to 1 or -1 the darker the color
c <- cor(
red_wine %>%
# first we remove unwanted columns
dplyr::select(-X) %>%
dplyr::select(-rating) %>%
mutate(
# now we translate quality to a number
quality = as.numeric(quality)
)
)
corrplot(c, method = "color", type = "lower", addCoef.col = "gray", title = "Red Wine Variables Correlations", mar=c(0,0,1,0), tl.cex = 0.7, tl.col = "black", number.cex = 0.9)