ggplot: Reorder Facet rows in facet.grid - ggplot2

I am having issues reorganizing my rows using facet.grid. I have four groups that by default are organized alphabetically. Using the factor command I can organize the groups - but they switch to columns (horizontal) and I need rows (vertical).
Example of subset of my data
Site Date Analyte Result
SJR#Calaveres 5/26/22 Top TP 0.35
SJR#Ship Channel 5/26/22 Top TP 0.56
Turning Basin 5/26/22 Top TP 0.46
Morelli Boat Ramp 5/26/22 Top TP 0.45
Public Dock 5/26/22 Top TP 0.55
wi_daily_wq<-read.csv("nutrients_p.csv")
p <- ggplot(data = wi_daily_wq, aes(x = Date, y = Result))+
geom_point(aes(color = Site)) +
theme_bw()
p
# Add vertical facets, aka divide the plot up vertically since they share an x axis
p + facet_grid(Analyte ~ .)
# Add vertical facets, but scale only the y axes freely
p + facet_grid(Analyte ~ ., scales = "free_y")
p + facet_grid(Analyte ~ ., scales = "free_y",
switch = "y") # flip the facet labels along the y axis from the right side to the left
#Trying to organize rows in order that I want
p + facet_grid(~factor (Analyte,levels=c('Bottom TP','Top TP','Bottom PO4','Top PO4')))
ylab(NULL) + # remove the word "values"
theme(strip.background = element_blank(), # remove the background
strip.placement = "outside") # put labels to the left of the axis text
If I remove the following argument from the above code I get the row plot I like. But, the rows are just in alphabetical order, not the order I need them in.
p + facet_grid(~factor (Analyte,levels=c('Bottom TP','Top TP','Bottom PO4','Top PO4')))
I'm wondering what adjustment I need to make to my code to organize the rows in the order I need them without converting the facets to columns.

Related

Aesthetics : fill - Warning ocours when ploting gg Boxplot

Ive written a R-chunk which should provide me a coloured ggplot boxplot. All needed templates are loaded, so is the Data.
The Data for „Healthy“ & „BodyTemperature“ is based inside the Data „Hospital“.
For Healthy there can be only 0 oder 1.
It should plott two Boxplots next to each other on the x-axis, one showing Healthy (0) the other one Unhealthy (1) compared to the BodyTemperature of the patients on y-axis.
The Boxplot should be coloured with the Template „Brewer“.
Everytime i try to run this chunk, a warning occours. Whats the solution?
colour:
colour <- brewer.pal(n = 2, name = "Set1")
colour
Warnung: minimal value for n is 3, returning requested palette with 3 different levels
[1] "#E41A1C" "#377EB8" "#4DAF4A"
R-Chunk
colour = brewer.pal(n = 2, name = "Set1")
ggplot(Hospital, aes(x = Healthy, y = BodyTemperature)) +
geom_boxplot(fill=c(colour)) +
ylab("Temperature") +
xlab("Healthy") +
ggtitle("Health compared to Temperature")
Warning ocours:
Error in `check_aesthetics()`:
! Aesthetics must be either length 1 or the same as the data (1): fill
Backtrace:
1. base (local) `<fn>`(x)
2. ggplot2:::print.ggplot(x)
4. ggplot2:::ggplot_build.ggplot(x)
5. ggplot2 (local) by_layer(function(l, d) l$compute_geom_2(d))
6. ggplot2 (local) f(l = layers[[i]], d = data[[i]])
7. l$compute_geom_2(d)
8. ggplot2 (local) f(..., self = self)
9. self$geom$use_defaults(data, self$aes_params, modifiers)
10. ggplot2 (local) f(..., self = self)
11. ggplot2:::check_aesthetics(params[aes_params], nrow(data))
Error in check_aesthetics(params[aes_params], nrow(data)) :
As you want to color your boxplots by the value of Healthy you could do so by mapping Healthy on the fill aesthetic. Also to use one of the Brewer palettes ggplot2 already offers some convenience functions which in case of the fill aes is called scale_fill_brewer. Not sure whether you want a legend but IMHO it does not make sense so I removed it via guides. Finally as you provided no data it's not clear whether your Healthy column is a numeric or a categorical variable. For this reason I wrapped in factor to make it categorical.
Using some fake random example data:
set.seed(123)
library(ggplot2)
Hospital <- data.frame(
Healthy = rep(c(0, 1), 50),
BodyTemperature = runif(100)
)
ggplot(Hospital, aes(x = factor(Healthy), y = BodyTemperature)) +
geom_boxplot(aes(fill = factor(Healthy))) +
scale_fill_brewer(palette = "Set1") +
ylab("Temperature") +
xlab("Healthy") +
ggtitle("Health compared to Temperature") +
guides(fill = "none")

geom_nodelabel_repel() position for circular ggraph plot

I have a network diagram that looks like this:
I made it using ggraph and added the labels using geom_nodelabel_repel() from ggnetwork:
( ggraph_plot <- ggraph(layout) +
geom_edge_fan(aes(color = as.factor(responses), edge_width = as.factor(responses))) +
geom_node_point(aes(color = as.factor(group)), size = 10) +
geom_nodelabel_repel(aes(label = name, x=x, y=y), segment.size = 1, segment.color = "black", size = 5) +
scale_color_manual("Group", values = c("#2b83ba", "#d7191c", "#fdae61")) +
scale_edge_color_manual("Frequency of Communication", values = c("Once a week or more" = "#444444","Monthly" = "#777777",
"Once every 3 months" = "#888888", "Once a year" = "#999999"),
limits = c("Once a week or more", "Monthly", "Once every 3 months", "Once a year")) +
scale_edge_width_manual("Frequency of Communication", values = c("Once a week or more" = 3,"Monthly" = 2,
"Once every 3 months" = 1, "Once a year" = 0.25),
limits = c("Once a week or more", "Monthly", "Once every 3 months", "Once a year")) +
theme_void() +
theme(legend.text = element_text(size=16, face="bold"),
legend.title = element_text(size=16, face="bold")) )
I want to have the labels on the left side of the plot be off to the left, and the labels on the right side of the plot to be off to the right. I want to do this because the actual labels are quite long (organization names) and they get in the way of the lines in the actual plot.
How can I do this using geom_nodelabel_repel()? i've tried different combinations of box_padding and point_padding, as well as h_just and v_just but these apply to all labels and it doesn't seem like there is a way to subset or position specific points.
Apologies for not providing a reproducible example but I wasn't sure how to do this without compromising the identities of respondents from my survey.
Well, there is always the manually-intensive, yet effective method of separately adding the geom_node_label_repel function for the nodes on the "left" vs. the "right" of the plot. It's not at all elegant and probably bad coding practice, but I've done similar things myself when I can't figure out an elegant solution. It works really well when you don't have a very large dataset to begin with and if you are not planning to make the same plot over and over again. Basically, it would entail:
Identifying if there exists a property in your dataset that places points on the "left" vs. the "right". In this case, it doesn't look like it, so you would just have to create a list manually of those entries on the "left" vs. "right" of your plot.
Using separate calls to geom_node_label_repel with different nudge_x values. Use any reasonable method to subset the "left" and "right datapoints. You can create a new column in the dataset, or use formatting in-line like data = subset(your.data.frame, property %in% left.list)
For example, if you created a column called subset.side, being either "left" or "right" in your data.frame (here: your.data.frame), your calls to geom_node_label_repel might look something like:
geom_node_label_repel(
data=subset(your.data.frame, subset.side=='left'),
aes(label=name, x=x, y=y), segment.size=1, segment.color='black', size=5,
nudge_x=-10
) +
geom_node_label_repel(
data=subset(your.data.frame, subset.side=='right'),
aes(label=name, x=x, y=y), segment.size=1, segment.color='black', size=5,
nudge_x=10
) +
Alternatively, you can create a list based on the label name itself--let's say you called those lists names.left and names.right, where you can subset accordingly by swapping in as represented in the pseudo code below:
geom_node_label_repel(
data=subset(your.data.frame, name %in% names.left),...
nudge_x = -10, ...
) +
geom_node_label_repel(
data=subset(your.data.frame, name %in% names.right),...
nudge_x = 10, ...
)
To be fair, I have not worked with the node geoms before, so I am assuming here that the positioning of the labels will not affect the mapping (as it would not with other geoms).

Tukey-Test Grouping and plotting in SciPy

I'm trying to plot results from a Tukey test, but I am struggling with putting data into groups based on a P-Value. This is the equivalent in R which I am trying to replicate. I have been using the SciPy one-way ANOVA tests and the Tukey test statsmodel but can't get these groups done in the same way.
Any help is greatly appreciated
I've also just found this another example in R of what I want to do in python
I have been struggling to do the same thing. I found a paper that tells you how to code the letters.
Hans-Peter Piepho (2004) An Algorithm for a Letter-Based Representation of All-Pairwise Comparisons, Journal of Computational and Graphical Statistics, 13:2, 456-466, DOI: 10.1198/1061860043515
Doing the coding was a little tricky as you need to check and replicate columns and then combine columns. I tried to add some comments to the colde. I figured out a method where you can run tukeyhsd and then from the results compute the letters. It should be possible to turn this into a function. Or hopefully part of tukeyhsd. My data is not posted but it is a column of data and then a column describing the groups. The groups for me are the five boroughs of NYC. You can also just change the comments and use random data the first time.
# Read data. Comment out the next ones to use random data.
df=pd.read_excel('anova_test.xlsx')
#n=1000
#df = pd.DataFrame(columns=['Groups','Data'],index=np.arange(n))
#df['Groups']=np.random.randint(1, 4,size=n)
#df['Data']=df['Groups']*np.random.random_sample(size=n)
# define columns for data and then grouping
col_to_group='Groups'
col_for_data='Data'
#Now take teh data and regroup for anova
samples = [cols[1] for cols in df.groupby(col_to_group)[col_for_data]] #I am not sure how this works but it makes an numpy array for each group
f_val, p_val = stats.f_oneway(*samples) # I am not sure what this star does but this passes all the numpy arrays correctly
#print('F value: {:.3f}, p value: {:.3f}\n'.format(f_val, p_val))
# this if statement can be uncommmented if you don't won't to go furhter with out p<0.05
#if p_val<0.05: #If the p value is less than 0.05 it then does the tukey
mod = MultiComparison(df[col_for_data], df[col_to_group])
thsd=mod.tukeyhsd()
#print(mod.tukeyhsd())
#this is a function to do Piepho method. AN Alogrithm for a letter based representation of al-pairwise comparisons.
tot=len(thsd.groupsunique)
#make an empty dataframe that is a square matrix of size of the groups. #set first column to 1
df_ltr=pd.DataFrame(np.nan, index=np.arange(tot),columns=np.arange(tot))
df_ltr.iloc[:,0]=1
count=0
df_nms = pd.DataFrame('', index=np.arange(tot), columns=['names']) # I make a dummy dataframe to put axis labels into. sd stands for signifcant difference
for i in np.arange(tot): #I loop through and make all pairwise comparisons.
for j in np.arange(i+1,tot):
#print('i=',i,'j=',j,thsd.reject[count])
if thsd.reject[count]==True:
for cn in np.arange(tot):
if df_ltr.iloc[i,cn]==1 and df_ltr.iloc[j,cn]==1: #If the column contains both i and j shift and duplicat
df_ltr=pd.concat([df_ltr.iloc[:,:cn+1],df_ltr.iloc[:,cn+1:].T.shift().T],axis=1)
df_ltr.iloc[:,cn+1]=df_ltr.iloc[:,cn]
df_ltr.iloc[i,cn]=0
df_ltr.iloc[j,cn+1]=0
#Now we need to check all columns for abosortpion.
for cleft in np.arange(len(df_ltr.columns)-1):
for cright in np.arange(cleft+1,len(df_ltr.columns)):
if (df_ltr.iloc[:,cleft].isna()).all()==False and (df_ltr.iloc[:,cright].isna()).all()==False:
if (df_ltr.iloc[:,cleft]>=df_ltr.iloc[:,cright]).all()==True:
df_ltr.iloc[:,cright]=0
df_ltr=pd.concat([df_ltr.iloc[:,:cright],df_ltr.iloc[:,cright:].T.shift(-1).T],axis=1)
if (df_ltr.iloc[:,cleft]<=df_ltr.iloc[:,cright]).all()==True:
df_ltr.iloc[:,cleft]=0
df_ltr=pd.concat([df_ltr.iloc[:,:cleft],df_ltr.iloc[:,cleft:].T.shift(-1).T],axis=1)
count+=1
#I sort so that the first column becomes A
df_ltr=df_ltr.sort_values(by=list(df_ltr.columns),axis=1,ascending=False)
# I assign letters to each column
for cn in np.arange(len(df_ltr.columns)):
df_ltr.iloc[:,cn]=df_ltr.iloc[:,cn].replace(1,chr(97+cn))
df_ltr.iloc[:,cn]=df_ltr.iloc[:,cn].replace(0,'')
df_ltr.iloc[:,cn]=df_ltr.iloc[:,cn].replace(np.nan,'')
#I put all the letters into one string
df_ltr=df_ltr.astype(str)
df_ltr.sum(axis=1)
#print(df_ltr)
#print('\n')
#print(df_ltr.sum(axis=1))
#Now to plot like R with a violing plot
fig,ax=plt.subplots()
df.boxplot(column=col_for_data, by=col_to_group,ax=ax,fontsize=16,showmeans=True
,boxprops=dict(linewidth=2.0),whiskerprops=dict(linewidth=2.0)) #This makes the boxplot
ax.set_ylim([-10,20])
grps=pd.unique(df[col_to_group].values) #Finds the group names
grps.sort() # This is critical! Puts the groups in alphabeical order to make it match the plotting
props=dict(facecolor='white',alpha=1)
for i,grp in enumerate(grps): #I loop through the groups to make the scatters and figure out the axis labels.
x = np.random.normal(i+1, 0.15, size=len(df[df[col_to_group]==grp][col_for_data]))
ax.scatter(x,df[df[col_to_group]==grp][col_for_data],alpha=0.5,s=2)
name="{}\navg={:0.2f}\n(n={})".format(grp
,df[df[col_to_group]==grp][col_for_data].mean()
,df[df[col_to_group]==grp][col_for_data].count())
df_nms['names'][i]=name
ax.text(i+1,ax.get_ylim()[1]*1.1,df_ltr.sum(axis=1)[i],fontsize=10,verticalalignment='top',horizontalalignment='center',bbox=props)
ax.set_xticklabels(df_nms['names'],rotation=0,fontsize=10)
ax.set_title('')
fig.suptitle('')
fig.savefig('anovatest.jpg',dpi=600,bbox_inches='tight')
Results showing the letters above plots using the tukeyhsd
Here is a function that returns letter labels if you have a symmetric matrix of p-values from a Tukey test:
import numpy as np
def tukeyLetters(pp, means=None, alpha=0.05):
'''TUKEYLETTERS - Produce list of group labels for TukeyHSD
letters = TUKEYLETTERS(pp), where PP is a symmetric matrix of
probabilities from a Tukey test, returns alphabetic labels
for each group to indicate clustering. PP may also be a vector
from PAIRWISE_TUKEYHSD.
Optional argument MEANS specifies group means, which is used for
ordering the letters. ("a" gets assigned to the group with lowest
mean.) Without this argument, ordering is arbitrary.
Optional argument ALPHA specifies cutoff for treating groups as
part of the same cluster.'''
if len(pp.shape)==1:
# vector
G = int(3 + np.sqrt(9 - 4*(2-len(pp))))//2
ppp = .5*np.eye(G)
ppp[np.triu_indices(G,1)] = pp
pp = ppp + ppp.T
conn = pp>alpha
G = len(conn)
if np.all(conn):
return ['a' for g in range(G)]
conns = []
for g1 in range(G):
for g2 in range(g1+1,G):
if conn[g1,g2]:
conns.append((g1,g2))
letters = [ [] for g in range(G) ]
nextletter = 0
for g in range(G):
if np.sum(conn[g,:])==1:
letters[g].append(nextletter)
nextletter += 1
while len(conns):
grp = set(conns.pop(0))
for g in range(G):
if all(conn[g, np.sort(list(grp))]):
grp.add(g)
for g in grp:
letters[g].append(nextletter)
for g in grp:
for h in grp:
if (g,h) in conns:
conns.remove((g,h))
nextletter += 1
if means is None:
means = np.arange(G)
means = np.array(means)
groupmeans = []
for k in range(nextletter):
ingroup = [g for g in range(G) if k in letters[g]]
groupmeans.append(means[np.array(ingroup)].mean())
ordr = np.empty(nextletter, int)
ordr[np.argsort(groupmeans)] = np.arange(nextletter)
result = []
for ltr in letters:
lst = [chr(97 + ordr[x]) for x in ltr]
lst.sort()
result.append(''.join(lst))
return result
To make that concrete, here is a full example:
from statsmodels.stats.multicomp import pairwise_tukeyhsd
data = [ 1,2,2,1,4,5,4,5,7,8,7,8,1,3,4,5 ]
group = [ 0,0,0,0,1,1,1,1,2,2,2,2,3,3,3,3 ]
tuk = pairwise_tukeyhsd(data, group)
letters = tukeyLetters(tuk.pvalues)
This will result in letters containing ['a', 'c', 'b', 'ac']

One ggplot from two data frames (1 bar each)

I was looking for an answer everywhere, but I just couldn't find one to this problem (maybe I was just too stupid to use other answers, because I'm new to R).
I have two data frames with different numbers of rows. I want to create a plot containing a single bar per data frame. Both should have the same length and the count of different variables should be stacked over each other. For example: I want to compare the proportions of gender in those to data sets.
t1<-data.frame(cbind(c(1:6), factor(c(1,2,2,1,2,2))))
t2<-data.frame(cbind(c(1:4), factor(c(1,2,2,1))))
1 represents male, 2 represents female
I want to create two barplots next to each other that represent, that the proportions of gender in the first data frame is 2:4 and in the second one 2:2.
My attempt looked like this:
ggplot() + geom_bar(aes(1, t1$X2, position = "fill")) + geom_bar(aes(1, t2$X2, position = "fill"))
That leads to the error: "Error: stat_count() must not be used with a y aesthetic."
First I should merge the two dataframes. You need to add a variable that will identify the origin of the data, add in both dataframes a column with an ID (like t1 and t2). Keep in mind that your columnames are the same in both frames so you will be able to use the function rbind.
t1$data <- "t1"
t2$data <- "t2"
t <- (rbind(t1,t2))
Now you can make the plot:
ggplot(t[order(t$X2),], aes(data, X2, fill=factor(X2))) +
geom_bar(stat="identity", position="stack")

GNUPLOT: dot plot with data depending dot size

I am trying plot data sets consisting of 3 coordinates:
X-coordinate, x-coordinate and the number of occurrences.
example:
1 2 10
3 1 2
3 2 1
I would like to draw for every line a dot at x,y with a diameter which is depending on the third value.
Is that possible with Gnuplot?
Create a 2D plot with variable point size. See the demo.
Example:
plot 'dataFile.dat' u 1:2:3 w points lt 1 pt 10 ps variable
This is basically equivalent to the existing answer, just shorter:
plot 'dataFile.dat' with circles
Credit: Gnuplot: plot with circles of a defined radius